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Preface 



The PKC 2001 conference was held at Shilla Hotel, Cheju Island, Korea, 13-15 
February 2001. Continuing the first conference PKC 1988 in Yokohama, Japan, 
PKC 1999 in Kamakura, Japan, and PKC 2000 in Melbourne, Australia, 
PKC 2001, the fourth conference in the international workshop series was dedi- 
cated to practice and theory in public key cryptography. 

The program committee of the conference received 67 submissions from 14 coun- 
tries and regions (Australia, Austria, China, Denmark, Esponia, France, Ger- 
many, Greece, Korea, Singapore, Spain, Taiwan, UK, and USA), of which 30 
were selected for presentation. All submissions were anonymously reviewed by 
at least 3 experts in the relevant areas. Revisions were not checked, and the 
authors bear full responsibility for the contents of their papers. In addition, 
there were three invited talks by Jun-Clreol Yang of the Ministry of Information 
and Communication, Korea; Mihir Bellare of the University of California at San 
Diego, USA; and Ko Itolr of the Organization for Road System Enhancement, 
Japan. 

The program committee consisted of 20 experts in cryptography and data se- 
curity drawn from the international research community: Kwangjo Kim (Chair, 
Information and Communications University, Korea), Claude Crepeau (McGill 
University, Canada), Ed Dawson (Queensland University of Technology, Aus- 
tralia), Yvo Desmedt (Florida State University, USA), Chi Sung Laih (National 
Cheng Kung University, Taiwan), Pil Joong Lee (POSTECH, Korea), Arjen 
Lenstra (Citibank, USA), Tsutomu Matsumoto (Yokohama National Univer- 
sity, Japan), David Naccaclre (Gemplus, France), Eiji Okamoto (University of 
Wisconsin-Milwaukee, USA), Tatsuaki Okamoto (NTT Labs, Japan), Clroonsik 
Park (ETRI, Korea), Sung Jun Park (BCQRE, Korea), Josef Pieprzyk (Uni- 
versity of Wollongong, Australia), Claus Schnorr (Frankfurt University, Ger- 
many), Nigel Smart (University of Bristol, UK), Jacques Stern (ENS, France), 
Susanne Wetzel (Bell Labs, USA), Moti Yung (CertCo, USA), and Yuliang 
Zheng (Monash University, Australia). Members of the committee spent nu- 
merous hours in reviewing the submissions and providing advice and comments 
on the selection of papers. 

The program committee also asked the expert advice of many of their colleagues, 
including: Ingrid Biehl, Colin Boyd, Marco Bucci, Gary Carter, Seong Taek 
Chee, Jean-Sebastien Coron, Nora Dabbous, Jean-Frangois Dhem, Marc Fis- 
chlin, Roger Fischlin, Pierre Girard, Jaeseung Go, Juanma Gonzalez-Nieto, He- 
lena Handschuh, Marie Henderson, Markus Jakobsson, Marc Joye, Jinlro Kim, 
Seungjoo Kim, Ju Seung Kang, Tri V. Le, Byoungcheon Lee, Hyejoo Lee, Phil 
MacKenzie, David M’Rai'hi, Renato Menicocci, Berncl Meyer, Pascal Paillier, 
Sang Joon Park, Beatrice Peirani, Jason Reid, Hein Roehrig, Amin Slrokrollahi, 
Igor Shparlinski, Jessica Staddon, Ron Steinfeld, Christoplre Tymen, and Kapali 
Viswanathan. We apologize for any omission in this list. 
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On the Security of a Williams Based Public Key 
Encryption Scheme 



Siguna Muller* 

University of Klagenfurt, Dept, of Math., A-9020 Klagenfurt, Austria 
siguna. mueller@uni-klu. ac . at 



Abstract. In 1984, H.C. Williams introduced a public key cryptosys- 
tem whose security is as intractable as factorization. Motivated by some 
strong and interesting cryptographic properties of the intrinsic struc- 
ture of this scheme, we present a practical modification thereof that has 
very strong security properties. We establish, and prove, a generaliza- 
tion of the “sole-samplability” paradigm of Zheng-Seberry (1993) which 
is reminiscent of the plaintext-awareness concept of Bellare et. al. The 
assumptions that we make are both well-defined and reasonable. In par- 
ticular, we do not model the functions as random oracles. In essence, 
the proof of security is based on the factorization problem of any large 
integer n = pq and Canetti’s “oracle hashing” construction introduced 
in 1997. Another advantage of our system is that we do not rely on any 
special structure of the modulus n = pq, nor do we require any specific 
form of the primes p and q. As our main result we establish a model 
which implies security attributes even stronger than semantic security 
against chosen ciphertext attacks. 

Keywords: Chosen Ciphertext Security, Plaintext Awareness, (Weak)- 
Sole-Samplability, Factorization Intractability, Oracle Hashing, Williams’ 
Encryption Scheme 



1 Introduction and Summary 

1.1 Provable Security and Attack Models 

A desirable property of any cryptosystem is a proof that breaking it is as diffi- 
cult as solving a computational problem that is widely believed to be difficult. 
A cryptographic scheme is provably secure if an attack on the scheme implies 
an attack on the underlying primitives it employs. While RSA is undoubtedly 
the most well-known and widely used public-key cryptosystem, it is not known 
if breaking RSA is as difficult as factoring (cf. [6]). A variety of factorization 
equivalent RSA modifications have been proposed which are essentially based 
on the same idea of unambiguous decryption (cf. also [18]). The sender can ma- 
nipulate the decoder to decrypt a ‘wrong message’ which then can be used to 
factorize the modulus. Because of this problem, all these systems are vulnerable 

* Supported by the Austrian Science Fund (FWF), P 13088-MAT and P 14472-MAT 

K. Kim (Ed.): PKC 2001, LNCS 1992, pp. 1-18, 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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to a chosen ciphertext attack (CCA). Under such an attack the adversary selects 
the ciphertext and is then given the corresponding plaintext. The strongest such 
attack is known as the adaptive CCA [19], in which an attacker can access a de- 
cryption oracle on arbitrary ciphertexts (except for the target ciphertexts which 
he is challenged with). 

It is known that plain RSA can broken under a CCA [21], which allows total 
recovery of a complete plaintext, resp. generation of a complete signature on 
an entire message. But RSA is also vulnerable to attacks that compromise the 
semantic security of the scheme. An adaptive CCA can successfully be mounted 
on some randomized versions of RSA (PKC ff 1), when only partial information 
of the plaintext is leaked [5]. 

The underlying goal of any encryption scheme is to achieve semantic secu- 
rity (informally, ‘whatever can be computed by an attacker about the plaintext 
given an object ciphertext, can also be computed without the object ciphertext’) 
under strong attack models (such as CCA) under well-specified assumptions and 
primitives. 



1.2 The General Goal of this Paper 

The two most often applied cryptographic primitives are the Diffie-Hellman (DH) 
problem and the factorization problem. There are a number of systems secure 
against CCA which are based on the DH problems, e.g., on the decisional DH 
and the existence of a collision resistant hash function [10], on the decisional DH 
in the random oracle model (ROM) [22], and on the computational DH in the 
ROM [1,17]. Also, suggestions have been made which are based on various new 
primitives (cf. e.g. [17]), but no encryption scheme secure against CCA has been 
published yet which utilizes the factorization utility of arbitrary numbers in a 
model without random oracles. Very recently, proposals have been made [16,17] 
of encryption schemes whose security rely on the ROM, and additionally require 
the very specific structure of the modulus, n = p 2 q. 

Most of the above methods require random oracles. Although the ROM is a 
convenient setting, we do not have a general mechanism for transforming pro- 
tocols that are secure in the ROM into protocols that are secure in real life. 
Actually, it is proved [8] that there are schemes which are secure in the ROM, 
but have no secure implementation in the “real world”. Moreover, we do not 
even know how to specify the properties for a transformation from the ROM 
into the real world. A natural goal thus is to design a chosen ciphertext secure 
system which is practical and proven secure under well defined intractability 
assumptions . 

On the other hand, although it is not know to what extend there exist algo- 
rithms that can exploit a special structure of the modulus for more efficiently 
factoring n, it would be desirable to establish a scheme based on the general 
factorization primitive n = pq with p and q arbitrarily. The only factorization 
equivalent RSA modification known that does not require a specific form of the 
modulus, nor any special structure of the primes, is the Williams scheme [23]. 
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Our proposed suggestion will consist of enhancing this scheme in order to obtain 
very strong cryptographic properties. 

Decrypting the Williams system is provably equivalent in difficulty to factor- 
ing n = pq. However, it is vulnerable to a CCA. The main result of the paper is to 
enhance this system. We will establish some new model which yields properties 
even stronger than security against CCA. 



1.3 Previous Methods for Proving CCA Security 

There are several methods for proving security against adaptive CCA 1 . 

Typically, in the ROM, semantic security against CCA is achieved by proving 
semantic security against chosen plaintext attacks (CPA) [2], and successively 
proving that the system is plaintext aware [2]. In the first definition given in 
[3] this basically meant that an adversary cannot produce a ciphertext without 
knowing (‘being able to compute’) the corresponding plaintext. 

The original definition required some modification. This is due to the way 
as how a valid ciphertext 2 is created. If its creation involves some internal RO- 
hash queries, the adversary that produced the ciphertext would not be able to 
compute the underlying plaintext [2] . The refined definition given in [2] involves 
some plaintext extractor which serves as a simulator of the decryption oracle. 
The extractor is required to find the underlying plaintext to a ciphertext without 
making any queries to the decryption oracle. A necessary requirement for the 
plaintext extractor to be successful is that the generation of the ciphertext only 
involved direct RO-queries. In that case, decryption can be simulated by the 
extractor, otherwise, it cannot. The fact that there exist some valid ciphertexts 
that cannot be decrypted by the simulator immediately leads to a smaller success 
rate of any CCA-attacker and to some loss of ‘advantage’ [2]. 

For practical realisations [1,17] the problem firstly consists in showing that 
such a plaintext extractor exists. Secondly, several probability estimates are 
necessary to ensure that the failure probability of the simulator remains small 
enough. 

Moreover, plaintext awareness (PA), as defined by Bellare et. al. has only 
been defined in the ROM. In [2] it is argued, why this concept would not make 
sense in the standard model. 

A more direct approach for proving security against CCA was done in [10]. 
It is shown that if their scheme could be broken under a CCA then this would 
lead to some method for breaking the underlying primitive (the decisional DH 
problem) . 

In [20] it was recently shown that under certain settings security against 
adaptive CCA is not even enough. Schnorr-Jakobsson demonstrate new and 
reasonable attack models (‘one-more attack’) which cannot be covered by CCA 

1 We do not consider the multi user setting, as this would require additional features 
going beyond the scope of this paper 

2 a valid ciphertext is usually understood as one that passes the validity test and hence 
does not get rejected by the decryption oracle 
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security. Indeed, they show that the most important and general attack models 
can be captured by some sort of proof of knowledge, which is also called plaintext 
awareness in [20] but is different from the definition of PA given in [2]. The 
PA in [20] also requires that any party that creates a valid ciphertext, must 
‘know’ the secret parameters involved in its creation (for details we refer to 
[20]). Although the arguments of [20] clearly demonstrate that security against 
CCA is not sufficient, their method requires the ROM as well and thus cannot 
be applied to our proposed scheme. 

The idea of incorporating some proof of knowledge in proving security against 
CCA goes back to [11,15]. Although these suggestions do not require the ROM, 
they are quite impractical as they rely on general and expensive constructions 
which make these cryptosystems difficult to realize in practice. 

The first practical approach for establishing security against adaptive CCA 
without the ROM was proposed by Zheng-Seberry already in 1993 [26]. They 
require their encoding functions / to be sole-samplable. Basically, this property 
means that there is no other way to generate any valid ciphertext than to first 
choose a plaintext x and evaluate / at x. Thus, an adversary cannot generate a 
new valid ciphertext without starting from a known plaintext. 

The underlying idea is obvious. If the party that generates a valid cipher- 
text must know the corresponding plaintext then it cannot abuse the system 
as it must have known the result of any decryption-query to begin with. Sole- 
samplability is one of the strongest notions of security that exists. It would au- 
tomatically imply security against non-malleability [11], against adaptive CCA, 
and also against the one more attack. Additionally, it does not require the ROM. 

The problem with the Zheng-Seberry suggestion is that they were not able 
to prove that their functions are indeed sole-samplable. They merely base the 
proof of CCA security ou this assumption. Although their concept seems to be 
the strongest the difficulty is to actually achieve it. 



1.4 The New Method and Our Main Results 

We suggest that the underlying primitive to be chosen in the standard model 
must be some form of sole-samplability. Obviously the most natural and impor- 
tant concept to be established is some ‘proof of knowledge’, as plaintext aware- 
ness in the ROM, or the Schnorr-Jakobsson plaintext awareness in the generic 
model. Our main results are the following. 

— We introduce a comparable notion to the sole-samplability paradigm of 
Zheng-Seberry. Although the proposed concept is slightly weaker than theirs, 
it has the advantage that all the established claims can rigorously be proved. 
We call an encryption scheme weak-sole-samplable if the following con- 
ditions hold. If C is a valid ciphertext then it either has to be the result 
of an encryption query, or it has to be the result of some specific function 
(algorithm) F. In the latter case, this F must be explicitly known, and ad- 
ditionally, it must be possible to efficiently generate the underlying plaintext 
with publicly available information only. 
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— This means, that if there do exist ways to find a valid ciphertext other than 
running the encryption oracle, then all these other ways must be explicitly 
known. Also, whenever a valid ciphertext was generated by such an (explicitly 
known) alternative method, then it must be possible to find the corresponding 
plaintext, without having to make any decryption- oracle queries. 

— The advantages of this concept are obvious. If all the ways of establishing 
valid ciphertexts are known, and if none of these cases is possible apart from 
knowing the underlying plaintext, then the behaviour of any adversary is 
the same as in the Zheng-Seberry model, which implies extremely strong 
security attributes. The adversary cannot obtain more information via any 
decryption-oracle queries, as he must have known the answers to begin with. 

— Another advantage is that this eliminates the need for a simulator and addi- 
tionally, the necessity for establishing the failure probability of any plaintext 
extractor (simulator). Any adversary that creates a valid ciphertext, is al- 
ways successful in finding the corresponding plaintext. There are not any 
valid ciphertext that can be created without the plaintext. 

— We establish the prove of ‘weak-sole-samplability’ on well-formulated and 
explicit properties. We only require Canetti’s oracle hash functions and the 
general factorization primitive. To the best of our knowledge this is the first 
proposal that does not require any structure of the modulus, nor any special 
form of the primes. Additionally, the scheme remains quite practical and can 
efficiently be realized by means of very rapid methods for the evaluation of 
combined Lucas sequences [24,25]. 

From the RSA family, the only factorization equivalent scheme for arbitrary 
p , q in the modulus n = pq is the Williams scheme. This system has a num- 
ber of very interesting properties. Indeed, it was a better understanding of the 
intrinsic structure of this scheme that lead us to establish the new model and 
the enhanced security properties. Since ‘weak-sole-samplability’ is strongly based 
on underlying properties of the Williams system, we present it in terms of this 
particular system. 

Outline: After a short description of the Williams scheme and some essential 
properties thereof (section 2), we present the proposed enhanced version (section 
3). Semantic security against CPA will be derived in section 4.1. In section 4.2 
we finally prove the property ‘weak-sole-samplability’. 

2 Some Preliminaries 

2.1 The Underlying Williams Scheme 

Let a, a be the distinct roots of x 2 — Px + Q for P, Q € Z with Q / 0 and 
discriminant D = P 2 — 4 Q. Then the Lucas sequences of the first and second 
kind of degree k, are defined by Uk{P,Q ) = a a lf i and Vk(P,Q) = a k + a k , 
respectively. It follows that these are sequences of integers that fulfill a number 
of interesting identities and arithmetical properties [25]. 
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Williams [23] utilizes the Lucas sequences for the special case where P = 2a, 
and Q = l. Then, if p is an odd prime, one obtains the fundamental congru- 
ence a p ~ 1 = a p ~ < - D / p ' > = 1 mod p. Analogously as in Rabin’s case, the ba- 
sis of the system is the congruence a,bp-( D lv))l 2 = q;(p-<, d /p))/ 2 = j-i modp. 
Williams develops a method to specify the correct signs. When working mod- 
ulo n = pq , and for e, d the public and private key, respectively, he obtains 
a 2ed = a 2ed = ±a mod n. This then establishes the equivalence between de- 
cryption and factoring [23]. 

Let n = pq , where p and q are two large primes. Further, let s,c € Z* be 
chosen such such that p = — mod 4 and q = ~ mod 4, gcd(s 2 — c, n) = 1 

and = — 1. In the following w € Z„ is assigned the role of the message 

to be encrypted. 

Let the public encryption key e and the secret decryption key d with 
gcd(e, (p + l)(g + 1)) = 1 be determined according to ed = mod to, where 

to = — Lill The numbers n,e,c,s constitute the Public Key, whereas 

the numbers p,q,m,d are kept secret. Throughout the paper, let foi = 1, if 

(^)=l>and h = -1, if (^) = — 1. 

Suppose gcd(ur — c, n) = 1 and denote a = a(w),b = b(w) mod n, and 
a = a(w) = a + b^/c mod n, where 3 

for bi = 1 : a 

for b\ = — 1 : a 

Define the sequences X{(a ) = a \ a = Vi ^' V ) , anc [ Yi(a,b) = 6 a a ~g = 

hi’, (2d. 1 ). 

In order to minimize problems concerning the existence of the above multi- 
plicative inverses mod n (cf. [14] ) it is preferable to work with a slightly modified 
version of the original scheme. In the following we will exclusively be applying 
this modification which essentially consists of reversing numerator and denomi- 
nator of the original encryption function X e (a) /Y e (a , b) mod n of [23] and adapt- 
ing the decryption scheme. 



— W +C 

w 2 — c ’ 



b = 



mod n, 



_ (w 2 +c)(s 2 +c)+4csw , _ 2s(w 2 +c)+2w(s 2 +c) , 

— 0 2 -c)(s 2 -c) > 0 — (w 2 -c)(s 2 -c) 1UOU U - 



(1) 



Williams’ Encryption: The first step of the encryption process consists of 
calculating a(w),b(w) from the message w by means of (1). Then w is encoded 
as 



E(w ) 



Y e (a(w), b(w)) 
X e (a(w)) 



mod n. 



The cryptogram C to be transmitted is the triple [E(w), b\, 62], where fq is 
defined via as above and 62 = a[w) mod 2, 6 2 € {0, 1}. 

3 It turns out that the quantities a(w) and b(w) for both cases 61 = 1 and —1 can be 
comprised into a more comprehensive formula. It can easily be shown that for a and b 
as above, a = a(w) = mod n, b = b(w) = mod n, where w = w mod n, 
if 61 = 1, and w = 2££±£ mo d n, if &i = —1. 
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Williams’ Decryption. Upon receiving C the receiver firstly calculates the 
values a 0 = mod n, and b 0 = ^(w^c mod n. 

The second step consists of determining a = (— l) b 2--V ei (a 0 ) anc j a ( w ~j an d 
b(w) by means of a(w) = aX^(ao) mod n and b(w) = aYd(ao, bo) mod n. 
Finally, the message w can be retrieved from a(w) and b(w) via 



t 1 mod n, if b\ = 1 , 

b(w) ’ x 

c(>(w)-s(a(t«) + l) . , _ _ 1 

a(w)+l-sb(w ) moa n i 11 °1 



(2) 



provided 4 gcd(b(w),n) = 1 for b\ = 1 , and gcd(a(ti;) + 1 — sb(w),n) = 1 for 
bi = - 1 . 



Remark 1. By utilizing efficient methods for the combined evaluation of the 
Lucas sequences [24,25], it can be shown that the Williams scheme requires 
about twice as many multiplications as RSA, with additionally two multiplicative 
inverses modulo n for both encryption and decryption. 



2.2 The Williams Scheme Under a CCA 

Definition 1. Let a'(w) and b'(w) be chosen such that a'(w), b'(w ) correspond 
to a(w),b(w) for the (wrong) case b[ = —b±. 

Further denote the false encryption of w’ by E’(w ) = w ) m0( l n i 

that is defined by following the formulas of the above encryption routine with 
respect to b[ = —b± (rather than b±). 

As with the Rabin scheme, the equivalence of decryption and factorization 
gives rise to a CCA [23]. One can even show the following [14]. 

Proposition 1. — If =1 or —1 then E'(w) = E(z) mod n and b\ (w) 

= bi(z). Then z = D(E(z)) mod n where the parameters for the decryption 
routine are b^fw) and 62 ■ Then gcd(io — z,n) gives the factorization of n. 

— For ~ c ^j = —1 and E’{w) = E(z) mod n, the problem of finding a(z) 
for a known a'(w) respectively w (and, similarly for b(z)) is computationally 
equivalent to the problem of factorizing n. 

— If there exists an algorithm for retrieving ±a(io) mod n from E(vS) (where 
both values correspond to the same b\) then there exists an efficient algorithm 
for factorizing n. 



2.3 Some Interesting Properties 

Proposition 2. Let 61 be fixed and E(w) as well as a(w) be given. Then there 
is an efficient algorithm for evaluating the underlying message w. 

4 It was shown in [14] that the number of messages not fulfilling these gcd-conditions 
is negligible. 
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Proof. For establishing this result we adopt the ideas of the attack developed in 
[4,9,12]. Let A = a(w) — 1 = ^_ c and B = 2 A -1 + 1. Then w 2 = cB. We may 
assume that A~ l mod n exists. 

We now consider the extension R = Z[x]/(a; 2 — oB,n), i.e. the elements of R 
are polynomials of degree 1 at most with coefficients modulo n. All arithmetic 
operations (addition, multiplication, division) over R can be done without the 
knowledge of the factorization of n (where we assume that practically division is 
always possible). We now define a mapping (f> : R i— > Z„ by <f>(kx+l) = kw+l mod 
n for k,l G Z n , where according to the value b\, w equals w, respectively qyry- 
Since w 2 = cB mod n it can easily be verified that ^ is a ring homomorphism. 

We show the result for b\ = —1, since b\ = 1 can be proved analogously. 
In particular, we then have <j>(x — s) = w — s = 4>{—xs + c) = 

and consequently ^( -a: f +c ) = w. In R. the expression ~ x f+ c becomes 2 x + 

C cB-^ m °d n which we will denote by w\X + w- 2 - Observe that, although we do 
not know the message w, we do know the polynomial that maps unto w, that is, 
w is now implicitly given by w±x + W 2 - 

The idea behind the attack in [4,9,12] is now to encrypt this polynomial in 
R which gives us a polynomial in x. The homomorphic image of this encrypted 
polynomial then equals E{w) since 0 is a homomorphism. The combined knowl- 
edge of E{w ) and this homomorphic image then can be used to derive w. 

To encrypt the polynomial w\X + W 2 we follow the routines w.r.t. a fixed b-[ . 
We firstly have to find the corresponding values to a(w) and b(w) in R. Since 
4>{x) = w and x 2 = cB in R , this can easily be shown to be accomplished. One 
obtains a(w) = b(w ) = c ^b X -i) m °d n ' n R- 

Consequently, one evaluates the Lucas sequences w.r.t. the a(w) in R modulo 
n and obtains the encryption in R. Let this result be denoted as ux+v. We stress 
that, since a(w) and b(w) in R merely consist of the public information, c, B, we 
know u and v. But then we know cj){ux + v) which equals E(w) and therefore 
we have uw + v = E(w). Hence, we can solve for w, if (■ u , n) = 1, which is very 
likely. Finally we now obtain w from w as desired. □ 

Remark 2. It is essential that the homomorphic image of the encrypted polyno- 
mial, which is determined by a = a(w), equals E(w). If a were some a(x), the 
results obtained would be different from w. In other words, to each E(w) = E'(z) 
correspond exactly two possible a, namely a(w) and a(z). Observe that from a 
given pair E{w),a only the output 2 can be obtained when the factorization of 
n is known (cf. Proposition 1). 

Proposition 3. Ifbi = —1 and a(x) = a(y) mod n, then x 2 = y 2 mod n, where 
x = m0(: i n anc [ y = Ul±£ moc i n . 

x-\-s & y+s 

Proof. ^From a(x) = frrf mod n we see that x 2 = — mod n, and, anal- 
ogously, for a(y), y 2 = — mod n. By hypothesis the right hand sides are 
equal, which gives the result. □ 




On the Security of a Williams Based Public Key Encryption Scheme 



9 



Proposition 4. For all x E Z* we have —a(x) = a{c/x) mod n. 

Proof. Observe that —a(x) mod n corresponds to the situation where during 
decryption the wrong a = a(a(x)) is obtained. By footnote 7 the decryption 
routine evaluates x(—a) = - mod n. Since also x(—a) = c ~ mod n the definition 

of a gives a(x(-a)) = mod n. □ 



Corollary 1. If for the case b\ = — 1 one has a(x ) = — a(y ) mod n then x 2 = 
(' c/y ) 2 mod n. 



3 The Proposed Scheme 

3.1 Requirements on the Hash Function 

Usually, semantic security is achieved via random oracles. Due to the ongoing 
controversy about the existence of such ‘truly random’ hash functions, we design 
our scheme in a way where we do not require the ROM. Instead, all our hash 
functions involved are special instances of Canetti’s oracle hash functions. 
For the exact definitions we refer to [7] and only recall the fundamental concepts 
required for our scheme. The primitive, oracle hashing, informally describes a 
hash function h that, like random oracles, ‘ hides all partial information on its 
input'. 

A salient property of oracle hashing is that it cannot be deterministic, which 
traditionally is the case with any hash function, where two invocations on the 
same input yield the same answer. However, any deterministic function F is 
inadequate for oracle hashing, since it is bound to disclose some information on 
the input, as F(x) itself is some information on x. 

Thus, oracle hash functions need to be probabilistic in the sense that different 
invocations on the same input result in different outputs. The output of x is ad- 
ditionally determined by some randomizer r which is responsible for the different 
hash values of x. That is, the hash of x is the output of h{ x, r) for the random 
value r. Still, there needs to be some means as to verify whether a given hash 
value was generated from a given input x. There needs to exist a verification 5 
algorithm, V, that correctly decides, given x and y, whether y is a hash of x. We 
use Canetti’s suggestion of a public randomness scheme. The randomizer r 
appears directly in the output of h(x,r). We write h(x,r) = r,h(x,r). 

The fundamental property of our underlying hash functions is Canetti’s or- 
acle indistinguishability. Informally, the hashes of x and y with respect to 
the same randomizer r, h(x,r) and h(y,r), should be computationally indistin- 
guishable to any polytime adversary. 

5 The verification property is somewhat reminiscent of signature schemes. Indeed, 
this is exactly what will be required in our decryption verification step below. It is 
stressed, however, that here no secret keys are involved and all the functions can be 
invoked by everyone [7]. 
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Canetti also considers the case where some (partial) information on x is al- 
ready known. E.g., if for some public function /, f(x) leaks some partial informa- 
tion on x, 6 then (f(x), h(x, r)) still should be computationally indistinguishable 
from ( f(x),h(y,r )) (for details see [7], p. 467). 

3.2 The Proposed Encryption and Decryption Schemes 

Let | a; | denote the length of the string x. The concatenation of two strings x and 
y is denoted by x\ | y and the bit-wise exclusive-or of x and y is denoted by x © y. 
We generally use the notation a = b mod n to denote the principal remainder 
a, that is the unique integer a G {0, ...,n — 1} that is congruent to b modulo n. 
We will assume that all calculations are carried out modulo n = pq. If W is the 
message to be encrypted let w = 0...0II; be the padded message of w such that 

M = M- 

Throughout, g will denote a cryptographic hash function to {0,1} that is 
both collision resistant and pre-image resistant, while h will denote a Canetti- 
oracle hash function (cf. section 3.1). 

The Proposed Encryption Routine £ = £{w). 

1. Choose randomly a session key S and a randomizer R from {0, 1}™ such 
that for 

wr = w ® h(S, R), and Sr = S © h(wR , R), 
one has (^) = = -1. 

2. Calculate a(wR),b(wR), E(wr ) and a(SR),b(SR), E(Sr) w.r.t. bi = — 1 fol- 
lowing the routines of section 2.1. 

3. Put H = g( 0...0 a(wfl) || 0...0 a(S R ) || S). 

length. = |7l| length. = |7l| 

4. Send the cryptogram C = [01,02,03,04] = [E(wr), E(Sr), R, H], 



The Proposed Decryption Routine T> = V{c\, ...,04). 

1. Decrypt ci to obtain <t(wr)<i(wr ) mod n, a(wR)b(wR) mod n following the 
formulas of section 2.1 for b 1 = — 1. 

2. Decrypt C 2 to obtain a(SR)a(SR) mod n, <r(SR)b(SR) mod n following the 
formulas of section 2.1 for bi = — 1. 

3. Select the signs a, ct(wr) G {—1,1}, ct(Sr) G {—1,1} and calculate the 
corresponding wr and Sr. 

4. Calculate S = h(wR,cz) ® Sr. 

5. Check whether 




length. = |7l| length. — |7l| 



(3) 



It only makes sense to consider the case where / does not give full information on 
x. Thus, / should be one-way, or uninvertible (without the use of the secret key). 
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6 . - If step (5) returns ‘true’, output w = h(S, C 3 ) ® wr. 

- Otherwise goto step (3), select another sign and repeat. 

- If step (5) returns ‘false’ for all <j{wr) € {—1, 1}, ct(Sr) e {—1,1} then 
return “NULL”. 



Note that b\ is fixed. The correct values b 2 follow directly from construction, 
since a(x ) is directly in the scope of h. It can easily be seen that the signs 
a of ±a(wn), ±ci(Sr) mod n, respectively, that pass the test in the decryption 
routine, are exactly the signs of the input to the hash function in the encryption 
routine, respectively 7 . Hence, we have 

Lemma 1. For the above routines, the decryption of an encryption of any mes- 
sage always gives this message. 



Remark 3. — The testing check during decryption captures Canetti’s verifica- 

tion property. H takes the role of the signing algorithm (with respect to the 
underlying w and S), and the testing step (3) takes the role of the signature 
verification algorithm. 

— Due to the strong security properties which are achieved, some message 
expansion is to be expected. The entire cryptogram can be viewed as an 
encryption with combined signature. The hash value provides a proof of 
knowledge of the plaintext w and the secret parameter S. In such a setting 
message expansion is typical, e.g., [10,20]. More length efficient proposals 
have been made in [26] but the claims were not proved. This was recently 
done in [1] in the ROM. 

4 Proof of Security 

4.1 Semantic Security against Chosen Plaintext Attacks 

An adversary A = (Ai, A 2 ) defining security against CPA is usually described 
via the well-known game play [2]. At first, A\ is run on input the public key, 
pk. At the end of A^s execution he outputs a triple (w 0 , Wi, s), where wq , w 1 are 
messages of the same length and s is some state information. A random one of 
w 0 and W\ is selected, say Wb, and a ‘challenge’ y is determined by encrypting 
Wb under pk. A 2 is given y but not Wb- It is now A' 2 s job to determine b, that is, 
to decide, if y is the encryption of Wq or of w\. In public key cryptography such 
an attack is always possible, since any adversary has access to the encryption 
oracle, as pk is always publicly known. 



7 Clearly, the party evaluating the hash value H can replace (the correct) a(x) by 
—a(x) mod n and use this as a forged hash input. Then in the deciphering process 
the wrong a will be determined. In that case, it can easily be seen [14] that the 
(Williams)-decryption of x obtained equals x(—o) = c{ mod n. Contrary to the 
forgery w.r.t. &i this however, does not expose the factorization of n. 
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Canetti showed how oracle hash functions can be used to build a crypto 
scheme that is semantically secure against chosen plaintext attacks [7], p. 466f. 
Typically, some information / = f(x) is part of the cryptogram and hence 
establishes some public information on the secret parameter x. Canetti assumes 
that / is uninvertible so that this information leakage does not allow complete 
retrieval of x. 

In our case this leads to the following technical requirement. We will assume 
that given E(Sr), g = 5(0. ..0 o(wr) || 0...0 c i(Sr ) || S'), it is impossible to find 
the complete underlying secret parameter S. 

Remark f. This assumption actually is not very strong. Informally, we have 
the following. Due to the Canetti hash function h involved, by construction no 
information on wr leaks from E(Sr) even if E does leak some information on 
Sr, where E denotes the Williams encryption of section 2.1. Also, if c i(wr ) 
did leak from E(Sr) and g, then, since retrieving wr from a(wR ) is equivalent 
to factoring n, wr cannot completely be recovered, so that an adversary has 
no information on H(wr,R). A lack of complete knowledge of h(wR,R) implies 
a lack of complete knowledge of S, even if Sr could completely be recovered 
from E(Sr) and g. Similarly, if some partial information on Sr and wr can be 
obtained by the combined knowledge of E(Sr) and g , again by the Canetti-haslr 
function h , S cannot completely be recovered. Thus, S would need to leak in full 
from g to violate our assumption. 

Analogously as in [7], we obtain the semantic security of the proposed scheme. 

Theorem 1. The proposed cryptosystem is semantically secure against adaptive 
chosen plaintext attacks, if the factorization of n = pq is hard, h is a Canetti 
oracle hash function with the additional technical assumption above on the cryp- 
tographic function g. 

Proof. (Sketch) Assume an adversary A that does break the scheme under a 
CPA. Let the probability for his success be as defined in the proof to Theorem 
10 in [7]. The tuple E(Sr), g = g(0...0 0(11;^) || 0...0a(5^) || S), yields some in- 
formation / on S which by the assumption above corresponds to the uninvertible 
function / in Canetti’s case. 

Construct an algorithm V that distinguishes between ( f(S),h(S,R )) and 
( f(S),h(S',R )), where S,S',R are randomly chosen and f(S) = ( E(SR.),g ). 
Since R is public, h(S,R) = R,h(S,R), and by the requirement that ft is a 
Canetti hash function, it follows that for uniformly chosen S, R the value h(S, R) 
is uniform in {0, 1} Z for some l. 

Given f(S),R,f (where f is either h{S,R) or h(S',R)), the distinguislrer V 
will construct a ciphertext in the following way. T) may choose either one of wq 
or w\ as message in the game play defining security against CPA. Assume that 
he chooses W\. Then he obtains wr = w 1 ©£ and he can hand A the ‘ciphertext’ 
C = [E{wr),E{Sr),R, g\. Now, if A outputs ‘wi then V outputs = h{S , R)\ 
Otherwise V outputs = h(S' ,R)’. 

As in Canetti’s case this follows since in the former event the constructed 
wr was the correct one, while in the latter, it must must have been equal to 
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wr = Wi © h(S', R) ^ w i © h(S, R). In particular, then A is given an encryption 
of a uniformly chosen message. The decryption cannot be W\, hence in that case, 
by the CPA game, it can only be w 3 , which A outputs. 

Analyzing T> is straightforward with the exact success probability given in [ 7 ]. 
The existence of such a distinguislrer yields a contradiction to the assumption. 

□ 



4.2 ‘Weak-Sole-Samplability’ 

Recall the notion of a valid ciphertext. This is such where the decryption oracle 
does not reject. We now completely characterize all possibilities how for the 
proposed scheme valid ciphertexts can be obtained. 

The randomizer R, since it directly occurs in C, plays a unique role. Nonethe- 
less, this information cannot be used for any attack. (Compare also Canetti’s 
discussion on this public randomizer [ 7 ]). 

Lemma 2. Let C = [01,02,03,04] be a valid ciphertext, h a Canetti oracle hash 
function, g a cryptographic hash function, and suppose that the factorization of n 
is infeasible. IfinC the 03 gets modified, then a necessary condition for obtaining 
another valid ciphertext is that all entries in C get modified. 

Proof. We analyze any adversary that tries to obtain another valid ciphertext. 
Let C3 be the modified value and let C be the encryption of the message w relative 
to the session key S and the randomizer R. We can assume that the adversary 
knows w (e.g. by mounting a CCA). We can also assume that he knows S (e.g. 
by his own encryption) because otherwise any such attack would not be possible 
(this follows from the fact that 04 remains unchanged, g is both collision resistant 
and pre-image resistant and since the given C is a valid ciphertext). By their 
definition he then also knows Wr and Sr. 

Suppose firstly that C = [ci, C2, Cg, C4] is also valid. Then the validity check 
( 3 ) passes if C4 = g(...||S) = g(....||S"), where S' = S'(c' 3 ) is in the fourth step 
of the deciphering oracle computed as S' = h(a(wR)wR , Cg) © Sr. By the choice 
of h necessarily S' = S so that h{afwR)wR , Cg) has to evaluate to S' © Sr. But 
that would imply that h(wR,c 3 ) = h(wR,c' 3 ) which is extremely unlikely [ 7 ]. 

Similarly, we see that C = [ci, c' 2 , c 3 , C4] where c' 2 is determined a priori, leads 
to a contradiction. Consequently, the adversary needs to evaluate a modified C2 
accordingly, i.e. such that the properties of the hash function are not being 
violated. This is only possible if at first the hash input, that is some Cg, is being 
selected. As above, we again need to have S' = S, where now S' = h(wR,c 3 ) © 
D(c' 2 ), and D is the Williams decryption of section 2 . 1 . ^From the hash output 
and S' = S the adversary then obtains the decrypted value x = D(c 2 ) (w.r.t. 
61 = — 1 ) of the forged c 2 , that is, x = S’ R (respectively c/S' R ). 

However, this x has to be of a special form (this will lead to the contradiction 
below) , because in the validity check it is required that 



c 4 = <?(-l|0..0 fr(5 fl )a(5 fl )||...) = ff(...||0..0 cr(x)a(x)||...) 
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(where we assume that the hash input is split up according to the appropriate 
lengths) . 

By Proposition 4 the above identity is only possible if either a{S R ) = a(x ) 
mod?r, or ci{Sr) = a{c/x) mod n, depending on whether the above ct’s corre- 
spond or not. According to Proposition 3 and Corollary 1, 

. — 2 _ . — 2 

either Sr = x mod n or Sr = {c/x) mod n. 

But we also have that C 2 ^ C 2 mod n. Also, by assumption, both C and C are 
valid which means that the test passes for exactly one cr (Sr) and thus for exactly 
one &{x) which then yields the corresponding values, x or c/x, respectively. 

Since the decryption of c' 2 , as well as that of C2, is being conducted w.r.t. 
61 = —1, the preimages, x and Sr, respectively c/x and Sr, need to be distinct 
mod n. Then also x and Sr, respectively c/x and Sr need to be distinct since 
otherwise s 2 = c mod n, contrary to the choice of s. 

Further, we can show that x ^ — Sr, respectively c/x ^ — Sr mod n. These 
two cases can be dealt with the same way. Observe that x was defined according 
to the hash output of c' 3 , i.e. as x = h{w R , c 3 ) ® S. If we assume that x = 
— Sr mod n then c ! 3 (which has been selected a priori) would hash to the specific 
output S ® {—Sr) mod n, a contradiction. Analogously, x ^ — Sr, respectively 
c/x ^ —Sr mod n. But then gcd(x — S R ,n), respectively gcd(c/x — Sr,u ) is 
a proper factor of n. To find this factor the adversary only needs to know x, 
respectively c/x and Sr, which he does when he knows x. 

Observe that the adversary already knows w,wr and Sr. However, Propo- 
sition 2 asserts that the adversary can calculate x from E(x) = c ' 2 and a{x) = 
<i{Sr) respectively —ci{Sr) mod n. 

Thus, the adversary would find the factorization of n. The derived contradic- 
tion to the hypothesis of the lemma implies that the adversary cannot compute 
a valid ciphertext by just forging C2 and C3. 

The adversary can also try to forge C\. But, in order to pass the test, then 
he would need to know c\ along with the corresponding <j{y)a{y), where y (re- 
spectively c/y) is the decryption of c[ (w.r.t. b\ = — 1 ). 

As decrypting cj or determining this a{y) is equivalent to factoring (Proposi- 
tion 1 ), the adversary can only, conversely, define y as w R and encrypt y (w.r.t. 
bi = — 1) to obtain his forged c\ . Similarly as above, he needs to evaluate S' R as 
h{w' R , c 3 ) ® S in order to fulfill the requirement on the hash function in (3) with 
respect to the last block in the input. But this now constitutes a special form 
of the attack considered above. The adversary would have to forge C2 which is 
impossible, independent of the choice of ci. □ 

Let us consider an adversary that has access to g, h, £, and V. He can play 
with his encryption oracle, and may also make t queries of the decryption oracle. 
He then produces a new valid ciphertext that he outputs. As in [ 2 ] we demand 
that the adversary never outputs a string that coincides with the value returned 
from some 5-query. 
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The basic idea in both Lemma 2 and Theorem 2 below is to analyze the 
different possibilities as how an attacker might be able to reuse existing valid 
ciphertexts. That is, we investigate all ways for obtaining valid ciphertexts (other 
than running the encryption oracle). 

We will give a complete characterization of all possibilities to find a valid 
ciphertext. Depending on whether the adversary knows the secret parameter S 
corresponding to some known valid ciphertext, he may follow only one of the 
specific steps given in the proof below. In each of these particular cases the proof 
also shows that the adversary is not able to generate any new valid ciphertext 
whose plaintext he does not know. 

For 1 < i < t, let C* = [c\\ c 2 \ c 3 \ c 4 ' be the ith valid cryptogram that the 
adversary gets decrypted. Let C be the new valid ciphertext that the adversary 
produces. By Lemma 2 we only need to distinguish between the following types 
of attacks. 

— Type I: There is some 1 < j < t such that for Cj = [ci, C 2 , C 3 , C 4 ], 

(a) C = [c[ ± ci, c 2 , c 3 , c 4 ], (b) C = [c 1; c 2 y^ c 2 , c 3 , c 4 ], 

(c) C = [c[ ± Ci, c ' 2 y^ c 2 , c 3 , c 4 ], 

— Type II: There is some 1 < j < t such that for Cj = [c\, c 2 , c 3 , c 4 ], (a) 

C' = [ci,c 2 ,c 3 ,c 4 y^ c 4 ], (b) C = [ci y^ ci,c 2 ,c 3 ,c 4 y^ c 4 ], 

(c) C = [ci, c ' 2 y^ c 2 , c 3 , c 4 ± c 4 ], (d) C = [c'l Cl, c 2 c 2 , c 3 , c 4 ], 

— Type III: For all i, C = [c'i y^ c^\ c ' 2 y^ c^, c ' 3 y^ 4*\ c^ y^ c^j. 

Theorem 2. Assume that h is a Canetti oracle hash function, g is a crypto- 
graphic hash function, and that it is computationally infeasible to find the fac- 
torization of n. Then the above encryption scheme is weak-sole-samplable. Any 
valid cryptogram that is not an £ output, has to be the result of some type II 
or III attack with the individual steps described below. In both cases, the adver- 
sary then knows the underlying w, S, as well as the underlying signs a in the 
hash-input. 

Proof. Type I attacker: 

Suppose we have a type I (a) attacker. Because c 4 is fixed we can as in the 
proof to Lemma 2 assume that the attacker knows the corresponding w and S. 
Observe that, since C is valid, the S = h(wn,c 3 ) ® Sr obtained in the fourth 
step of V passes the test (3) for the unique <j{vjr ) and thus for the unique wr. 
If now Ci y^ c'i then the S' obtained will be different from S. This follows, since 
for fixed b\ = — 1 the decryption of is either w' R or c/w' R . These values are 
different from wr because otherwise Ci = c( . Therefore the test will reject for 
this S' . In order to obtain the same S, also c 2 would have to be modified, which 
is not the case under the type of attack under inspection. 

Similarly we see that a type I (b) attack will be rejected by the test because 
the S' obtained in step (4) will not match the valid S. 

Now consider a type I (c) attacker. In order to guarantee that the S' obtained 
in step (4) equals the valid S, the adversary can only proceed analogously as 
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in the proof to Lemma 2. He needs to define the (Williams) decryption of the 
modified c 2 , that is S R , as 5® h(y,c 3 ), where y = w' R is the decryption of c' 1 . 
But these w' R and S' R need to pass (3). Similarly as in Lemma 2, he would be 
able to factorize n, a contradiction. Hence, any type I attack will get rejected as 
well. 

Type II attacker: 

For a type II (a) attacker observe that by definition ci, C 2 , C 3 remain unchanged. 
Hence, in steps 2 and 3 during decryption, the quantities ±cl(wr), ±cl(Sr) cor- 
responding to the original w and S are obtained. Since C 4 = g(a(wn)a(wR) 
||cr(5i{)a(5fl)||5) for the specific <j(wr), a(Sn), one can only obtain a modified 
hash output w.r.t. different signs, <t(wr) and/or <j(Sr). The requirement on g 
necessitates that the adversary knows the individual blocks in the hash input 
(he can only obtain the output from the input). As he also needs to know Ci 
and C 2 , by Lemma 2, he knows the modified message w as well as the modified 
S that result in the modified cryptogram due to the change of the cr’s. 

In a type II (b) attack the test only passes if the hash output, C 4 is calcu- 
lated as the hash-output w.r.t. the modified cj . Then the adversary has to know 
the cr(w' R )a(w' R ) that is obtained by decrypting d 1 . As usual, by Lemma 2 , we 
conclude that he can find w' R , respectively c/w’ R . To obtain the hash value C 4 
he also needs to know the a(Sn)a(Sii). Again, since he knows C 2 he then knows 
Sr. Depending on the cr’s selected he obtains two different (modified) S and 
four different (modified) w. He can easily verify which of those have the desired 
encryptions so that he knows the modified w and S that result in C . 

Exactly the same way we can show that in a type II (c) attack the adversary 
needs to know the underlying quantities that result in C' . 

A type II (d) attack can be dealt with analogously, because knowledge of a(x) 
and E(x) is equivalent to knowing x, where x firstly is w' R and then secondly S R . 

Type III attacker: 

The result follows exactly as for a type II (d) attacker because the value C 3 is not 
essential. The adversary would need to know the first two blocks of the input to 
the hash function. Along with and c 2 this is equivalent to knowing w' R and 
S' R where c' 3 = R' . However, since c 3 is public one easily finds the underlying w' 
and S' from the randomized w' R and S' R . 

We have shown that valid ciphertexts cannot be obtained apart from knowing 
their underlying parameters, which completes the proof of Theorem 2 in all cases. 

□ 
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Abstract. Almost all of the current public-key cryptosystems (PKCs) 
are based on number theory, such as the integer factoring problem and 
the discrete logarithm problem (which will be solved in polynomial-time 
after the emergence of quantum computers). While the McEliece PKC 
is based on another theory, i.e. coding theory, it is vulnerable against 
several practical attacks. In this paper, we carefully review currently 
known attacks to the McEliece PKC, and then point out that, without 
any decryption oracles or any partial knowledge on the plaintext of the 
challenge ciphertext, no polynomial-time algorithm is known for invert- 
ing the McEliece PKC whose parameters are carefully chosen. Under the 
assumption that this inverting problem is hard, we propose slightly mod- 
ified versions of McEliece PKC that can be proven, in the random oracle 
model, to be semantically secure against adaptive chosen-ciphertext at- 
tacks. Our conversions can achieve the reduction of the redundant data 
down to 1/3 ~ 1/4 compared with the generic conversions for practical 
parameters. 



1 Introduction 

Since the concept of public-key cryptosystem (PKC) was introduced by Diffie 
and Heilman [5], many researchers have proposed numerous PKCs based on 
various problems, such as integer factoring, discrete logarithm, decoding a large 
linear code, knapsack, inverting polynomial equations and so on. While some of 
them are still alive, most of them were broken by cryptographers due to their 
intensive cryptanalysis. As a result, almost all of the current (so-called) secure 
systems employ only a small class of PKCs, such as RSA and elliptic curve 
cryptosystems, which are all based on either integer factoring problem (IFP) or 
discrete logarithm problem (DLP). This situation would cause a serious problem 
after someone discovers one practical algorithm which breaks both IFP and DLP 
in polynomial-time. No one can say that such an algorithm will never be found. 
Actually, Slror has already found a (probabilistic) polynomial-time algorithm in 
[25], even though it requires a quantum computer that is impractical so far. In 

K. Kim (Ed.): PKC 2001, LNCS 1992, pp. 19-35, 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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order to prepare for that unfortunate situation, we need to find another secure 
scheme relying on neither IFP nor DLP. 

The McEliece PKC, proposed by R.J. McEliece in [18], is one of few alter- 
natives 1 for PKCs based on IFP or DLP. It is based on the decoding problem 
of a large linear code with no visible structure which is conjectured to be an 
NP-complete problem. 2 While no polynomial-time algorithm has been discov- 
ered yet for decoding an arbitrary linear code of large length with no visible 
structure, a lot of attacks (some of them work in polynomial-time) are known 
to the McEliece PKC [1,3,4,12,15,28,17,13]. 

In this paper, we carefully review these attacks in Section 3, and then point 
out that all the polynomial-time attacks to the McEliece PKC require either de- 
cryption oracles or partial knowledge on the corresponding plaintext of the chal- 
lenge ciphertext. And then without them, no polynomial-time attack is known 
to invert the McEliece PKC (whose parameters are carefully chosen) . Under the 
assumption that this inverting problem is hard, we convert this problem into 
semantically secure McEliece PKCs against adaptive clrosen-ciplrertext attacks 
(CCA2) by introducing some appropriate conversions. We discuss which con- 
versions are appropriate for the McEliece PKC in Section 4. While some of the 
generic conversions proposed in [24,9] are also applicable to the McEliece PKC, 
they have a disadvantage in data redundancy (which is defined by the difference 
between the ciphertext size and the plaintext size) . A large amount of redundant 
data is needed for the generic conversions since the block size of the McEliece 
PKC is relatively large. Our conversions in Section 4.4 need less redundant data 
than the generic ones. 

2 McEliece Public-Key Cryptosystem 

In this section, we briefly describe the McEliece PKC. 

Key generation: Generate the following three matrices G,S,P: 

G: k x n generator matrix of a binary Goppa code which can correct up 
to t errors, and for which an efficient decoding algorithm is known. The 
parameter t is given by [ where d m i n denotes the minimum Ham- 
ming distance of the code. 

S: k x k random binary non-singular matrix 
P : n x n random permutation matrix. 

Then, compute the k x n matrix G = SGP. 

Secret key: (5, G, P) 

Public key: (G',t) 

Encryption: The ciphertext c of a message m is calculated as follows: 

c = mG ® z (1) 

1 Another alternative may be a quantum public-key cryptosystem [21] which will be 
available after the emergence of quantum computers. 

2 The complete decoding problem of an arbitrary linear code is proven to be NP- 
complete in [29]. 
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where m is represented in a binary vector of length k, and z denotes a random 
binary error vector of length n having t l’s. 

Decryption: First, calculate cP~ l 

cP- 1 = (mS)G © zP” 1 (2) 

where P~ l denotes the inverse of P. Second, apply the decoding algorithm 
EC for G to cP~ l . Since the Hamming weight of zP~ l is t, one can obtain 
mS 

mS = EC(cP _1 ). (3) 

The plaintext of c is given by 

m=(mS)S~ 1 . (4) 

3 Attacks to McEliece PKC 

In this section, we review currently known attacks to the McEliece PKC. 

While no efficient algorithm has been discovered yet for decomposing G' 
into ( S,G,P ) [19], a structural attack has been discovered in [17]. This attack 
reveals part of structure of a weak G' which is generated from a “binary” Goppa 
polynomial. However, this attack can be avoided simply by avoiding the use of 
such weak public keys. (This implies G should not be a BCH code since it is 
equivalent to a Goppa code whose Goppa polynomial is 1 • x 2t , i.e. “binary”. 3 ) 
Next case we have to consider is that an equivalent Goppa code of G' (which is 
not necessarily G ) and whose decoding algorithm is known happens to be found. 
This probability is estimated in [1] [10] , and then shown to be negligibly small. 

All the other known attacks are for decrypting ciphertexts without breaking 
public- keys. We categorize them into the following two categories, critical attacks 
and non-critical attacks, according to whether these attacks can be avoided sim- 
ply by enlarging the parameter size or not. If avoided, we categorize it in the 
non-critical attacks. Otherwise, in the critical ones. Interestingly, all the critical 
attacks require either additional information, such as partial knowledge on the 
target plaintexts, or an decryption oracle which can decrypt arbitrarily given 
ciphertexts except the challenge ciphertexts. And then without this additional 
information and this ability, no efficient algorithm is known to decrypt an arbi- 
trarily given ciphertext of the McEliece PKC. 

3.1 Non-critical Attacks 

The following two attacks can be avoided simply by enlarging the parameter size 
or by applying Loidreau’s modification [16] without enlarging the parameter size. 
Thus, not critical. 

3 In [14], a variant of McEliece PKC, where G is a BCH code, was broken. However 
it is not clear their attack works correctly since further information has failed to 
appear. 
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Generalized Information-Set-Decoding Attack. Let G' k , Ck and z k denote 
the k columns picked from G' , c and 2 , respectively. They have the following 
relationship 

c k = m.G'k ® z k - (5) 

If Zk = 0 and G' k is non-singular, m can be recovered by [1] 

m = c k G' k 1 . (6) 

Even if z k ^ 0, in can be recovered by guessing Zk among small Hamming weights 
[15] (we call this the generalized information-set-decoding (GISD) attack). The 
correctness of the recovered plaintext to is verifiable by checking whether the 
Hamming weight of c © mG' is t or not. 

The computational cost of this generalized version (where Zk is guessed) is 
slightly faster than the original one (where Zk is supposed to be 0), but it is still 
infeasible for appropriate parameters since its computational cost is asymptoti- 
cally lower bounded by C(n, k)/C(n — t,k). 

Finding-Low- Weight-Codeword Attack. This attack uses an algorithm 
which finds a low-weight codeword among codewords generated by an arbitrary 
generator matrix using a database obtained by pre-computation [26,4]. Since the 
minimum-weight codeword of the following (k + 1) x n generator matrix 




is the error vector z of c where c = mG' © z, this algorithm can be used to 
recover to from a given ciphertext c. 

The precise computational cost of this attack is evaluated in [4], and then 
shown to be infeasible to invert c for appropriate parameters, e.g. n > 2048 and 
optimized k and t , even though the original parameters (n, k , t ) = (1024, 524, 50) 
suggested in [18] is feasible with the work factor of 2 64 ' 2 . (Under the assumption 
that each iteration is independent, the expected computational cost of this attack 
is asymptotically lower bounded by C(n, k + 1)/C(n — t, k + 1) and therefore it 
is infeasible for appropriate parameters.) 



3.2 Critical Attacks 

The following attacks cannot be avoided by enlarging the parameter size or by 
applying Loidreau’s modification [16]. Therefore critical. 

Known- Partial-Plaintext Attack. The partial knowledge on the target plain- 
text drastically reduces the computational cost of the attacks to the McEliece 
PKC [4,13]. 
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For example, let to; and m r denote the left fc; bits and the remaining k r bits 
in the target plaintext to, i.e. k = ki + k r and to. = (m;||TO r ). Suppose that an 
adversary knows m r . Then the difficulty of recovering unknown plaintext to; in 
the McEliece PKC with parameters (n, k) is equivalent to that of recovering the 
full plaintext in the McEliece PKC with parameters (n, ki) since 

c = mG' ® 2 
c = to.;GJ © m r G' r © z 
c © m r G' r = miG'i © z 

c' = miG'i © (8) 

where G[ and G' r are the upper k\ rows and the remaining lower k r rows in G' , 
respectively. 

If ki is fixed to a small value, the computational cost of recovering the un- 
known ki bits from c, m r and G' is a polynomial of n since even if non-critical 
attacks are used, it is asymptotically bounded by kfC(n, ki)/C(n — t, ki) where 
kf is a small constant. 



Related-Message Attack. This attack uses the knowledge on the relationship 
between the target plaintexts [3]. 

Suppose two messages ?ni and m -2 are encrypted to ci and C 2 , respectively, 
where Ci = m\G' © Z\, C 2 = to^G' © Z 2 , and Z\ ^ %2- If an adversary knows their 
linear relation between the plaintexts, e.g. Sm = toi © TO 2 ■ Then the adversary 
can efficiently apply the GISD attack to either Ci or C 2 by choosing k coordinates 
whose values are 0 in ( 6mG ' © Ci © C 2 ). Since z\ © Z 2 = SmG' © ci © C 2 and the 
Hamming weight t of the error vector z is far smaller than n/2. Therefore a 
coordinate being 0 in (SmG 1 © C\ © c 2 ) should also be 0 in both Z\ and z 2 with 
the high probability of 

When the same message is encrypted twice (or more) using different error 
vectors z\ and Z 2 , the value Zi®Z 2 is simply given by Ci©C 2 - This case is referred 
to as the message-resend attack [3]. 

Reaction Attack. This attack might be categorized as a chosen-ciphertext at- 
tack (CCA), but uses a weaker assumption [12] than the CCA: the adversary 
observes only the reaction of the receiver who has the private-key, but does not 
need to receive its decrypted plaintext. (Similar attack is independently pro- 
posed in [28]. In this attack, an adversary receives the corresponding plaintexts. 
Therefore this attack is categorized as a CCA.) 

The idea of this attack is the following. The adversary flips one or a few bits 
of the target ciphertext c. Let d denote the flipped ciphertext. The adversary 
transmits d to the proper receiver and observes his/her reaction. The receiver’s 
reactions can be divided into the following two: 
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Reaction A: Return a repeat request to the adversary due to uncorrectable 
error or due to the meaningless plaintext. 

Reaction B: Return an acknowledgment or do nothing since the proper plain- 
text to is decrypted. 

If the total weight of the error vector does not exceed t after the flipping, the reac- 
tion B is observed. Otherwise the reaction A is observed. Therefore by repeating 
the above observations polynomial times of n , the adversary can determine the 
error vector. Once the error vector is determined, the corresponding plaintext is 
easily decrypted using the GISD attack. 

Malleability Attack. This attack allows an adversary to alter any part of the 
corresponding plaintext of any given ciphertext c without knowing the plaintext 
m, i.e. the adversary can generate a new ciphertext d whose plaintext is m! = 
to © 5m from any given ciphertext c without knowing to [13,28]. 

This attack is described as follows. Let G'[i] denote the *-th row of the public 
matrix G' and I = {*i, * 2 , • • •} denote a set of coordinates ij whose value is 1 in 
5m. The ciphertext d is calculated by 

c = c0 G'[i\ = (to © 5m)G' © z = m'G' © 2 . (9) 

iei 

This attack tells us that the McEliece PKC does not satisfy non-malleability [6] 
even against passive attacks, such as chosen-plaintext attacks. And then under 
clrosen-ciphertext scenario where an adversary can ask an decryption oracle to 
decrypt a polynomial number of ciphertexts (excluding the challenge ciphertext 
c), the adversary can decrypt any given ciphertext c by the following way. First 
the adversary asks the oracle to decrypt d, then the oracle returns m' = m(B5m. 
Thus he/slre can recover the target plaintext of c by to = m' © 5m. 

4 Conversions for McEliece PKC 

As mentioned in Section 3, without any decryption oracles and any partial knowl- 
edge on the corresponding plaintext of the challenge ciphertext, no polynomial- 
time algorithm is known for inverting the McEliece PKC (whose parameters are 
carefully chosen). Under the assumption that this inverting problem is hard, this 
problem can be converted into the hard problem of breaking the indistinguislra- 
bility of encryption against critical attacks (or more generally against adaptive 
chosen-ciphertext attacks) by introducing appropriate conversions in the random 
oracle mode. In this section, we discuss which conversions are appropriate for 
the McEliece PKC and which are not. 

4.1 Notations 

We use the following notations in this paper: 
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C(n,t) : The number of combinations taking t out of n elements. 

Prep(m) : Preprocessing to a message ro, such as data-compression, clata- 
padding and so on. Its inverse is represented as Prep~ 1 (). 

Hash(x) : One-way hash function of an arbitrary length binary string £ to a 
fixed length binary string. When the output domain is Zn where 
N = C(n,t), we use Hash z (x ) instead of Hash(x). 

Conv(z ) : Bijective function which converts an integer z £ Z n where 

N = C (n, t) into the corresponding error vector 2 . Its inverse 
is represented as Co?w -1 (). 

Gen(x) : Generator of a cryptographically secure pseudo random sequences 
of arbitrary length from a fixed length seed x. 

Len(x) : Bit-length of x. 

Msb Xl (x 2 ) : The left x\ bits of Xi- 

Lsb Xl (x 2 ) : The right x\ bits of X 2 - 

Const : Predetermined constant used in public. 

Rand : Random source which generates a truly random (or computation- 
ally indistinguishable pseudo random) sequence. 
g McEhece z ^ . Encryption of x using the original McEliece PKC with an error 

vector z. 

■jj McEliece^ . Decryption of x using the original McEliece PKC. 

4.2 Insufficient Conversions for McEliece PKC 

OAEP Conversion. In [2], Bellar and Rogaway proposed a generic conver- 
sion called OAEP (Optimal Asymmetric Encryption Padding) which converts a 
OWTP (One-Way Trapdoor Permutation), such as RSA primitive, into a PKC 
which is indistinguishable against adaptive chosen-ciplrertext attacks (CCA2). 
The McEliece PKC with this OAEP conversion is given in Fig.l. 4 Unfortunately, 
this conversion does not work correctly since the reaction attack is still appli- 
cable. This does not mean the OAEP conversion has a fault, but the McEliece 
primitive is not a permutation. 

Fujisaki-Okamoto’s Simple Conversion. In [8], Fujisaki and Okamoto pro- 
posed a generic and simple conversion from a PKC which is indistinguishable 
against CPA (Chosen-Plaintext Attacks) into a PKC which is indistinguish- 
able against CCA2. The McEliece PKC with this conversion is given in Fig.2. 4 
Unfortunately, this conversion does not work correctly since the known-partial- 
plaintext attack efficiently works unless Len(r ) is close to k. 

This does not mean Fujisaki-Okamoto’s simple conversion has a fault, but 
the original McEliece PKC is distinguishable even against CPA. Any passive 
adversary (who do not use decryption oracle) can guess which message of mo 
and mi corresponding plaintext of the given ciphertext c of the original McEliece 
PKC by seeing whether the Hamming weight of rribG' © c is t or not, where 
b £ {0, 1}. 

4 Due to the limitation of pages, we omit the corresponding decryption process. 
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Encryption of m: 


r, z 


= Rand 


fh 


= Prep(m) 


yi 


= ( fh\\Const ) © Gen(r) 


V2 


= r © Hash(yi) 


z 

c 


= Conv{z) 

m £ McEliece {{yillm ^ z) 


return 


c 



Fig. 1. OAEP conversion + McEliece 
PKC 



Encryption of m: 


/ 

r, r 


= Rand 


fh 


= Prep{m) 


z 


= Conv(Hash z (fh\\r)) 

gMcEliece^f ^ 


yi 


V2 


= Gen(r') © (m||r) 


c 


= 2/i Ilya 


return 


c 



Fig. 3. Pointcheval’s generic conver- 
sion 



Encryption of m: 

r := Rand 
fh := Prep(m) 
z := Conv(Hash z (fh\\r)) 
c := t ((m\\r),z) 

return c 

Fig. 2. Fujisaki-Okamoto’s simple 
conversion + McEliece PKC 



Encryption of m: 

r 
fh 
z 
yi 
yi 
c 

return c 

Fig. 4. Fujisaki-Okamoto’s generic 
conversion 



= jxana 
= Prep(m ) 

= Conv(Hash z (fh\\r)) 

McEliece ^ ^ 

= Gen(r) © fh 
= Ui Wvo 



4.3 Generic Conversions Being Applicable to McEliece PKC 

Pointcheval’s Generic Conversion. In [24], Pointcheval proposed a generic 
conversion from a PTOWF (Partially Trapdoor One-Way Function) to a PKC 
which is indistinguishable against CCA2. 

The definition for f(x, y) to be PTOWF is the following: 

— For any polynomial-time adversary and for any given 2 = f(x,y), it is com- 
putationally infeasible to get back the partial preimage x, 

— With some extra secret information, it is easy to get back the x. 

Not only ElGamal[7], Okamoto-Uchiyama[22], Naccaclre-Stern[20] and Paillier[23] 
primitives, but also McEliece primitive can be categorized in PTOWF. Therefore 
Pointcheval’s generic conversion is also applicable to the McEliece PKC with the 
same proof in [24], The McEliece PKC with this conversion is given in Fig. 3. 4 



Fujisaki-Okamoto’s Generic Conversion. In [9], Fujisaki and Okamoto pro- 
posed a generic conversion from OWE (One-Way Encryption), which includes 
both OWTP and PTOWF, into a PKC being indistinguishable against ACC2. 
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Encryption of m: 


Decryption of c: 


r := Rand 


j/i := V Mc Eliece (Msb n {c)) 


fh := Prep{m) 


z := Msb n (c ) ® yiG' 


z := Hash z (r\\m) 


z := Conv~ 1 (z) 


(yillite) := Gen{z) ® (r||m) 


(r\\m) := Gen(z) ® (j/i||y 2 ) 


z := Conv{z) 


If z=Hash z (r\\m) 


trMcEliece/ \ii 

c:=£ U/i,2)|| y 2 


return Prep -1 {fh) 


return c 


Otherwise reject c 



Fig. 5 . Conversion a : Len(yi) = k and Len(y 2 ) = Len(r\\m) — k. If Len(r\\m) = k, 
remove y 2 . 



Needless to say, McEliece primitive can be categorized in the OWE, and there- 
fore their generic conversion is applicable to the McEliece PKC with the same 
proof in [9]. The McEliece PKC with this conversion 5 is given in Fig. 4 . 4 

4.4 Our Specific Conversions 

While one can design semantically-secure McEliece PKCs by simply employing 
the above generic conversions, they are not necessarily suited for the McEliece 
PKC. Since the block size of the McEliece PKC is larger than the well-known 
PKCs, such as RSA, elliptic curve cryptosystems and so on, the redundancy 
of data (which is defined by the difference between the bit length of a plain- 
text and its corresponding ciphertext) becomes large. For example, for (n, k) = 
(4096,2560), the generic conversions require more than or equal to 4096 bits for 
the overhead data. On the other hand, our conversions described in Fig. 5 ~ 
7 require less overhead data than the generic ones. For example, for the same 
settings and Len(r) = 160 and Len(C oust) = 160, our conversion 7 requires 
only 1040 bits. (This might still be large but interestingly this value is smaller 
than the original McEliece PKC.) The comparison results are summarized in 
Table 1. 

The point of the conversion 7 is that not only a plaintext but also an error 
vector is taken from a part of yi (or (1/2 \ \ yi ) ) . This reduces the data overhead even 
than the original McEliece PKC when Len(r) + Len(C oust) < [log 2 C(n, t)J . 
The study to reduce the overhead data (and simultaneously to improve the 
security against related-message attacks) has been performed in [27]. While his 
conversions do not provide provable security against CCA2 (since either known- 
partial-plaintext attacks or reaction attacks are applicable at least), his approach 
to reduce the overhead data should be appreciated. 

Indistinguishability of Our Conversions. It is intuitively clear that our 
conversions resist all the critical attacks in Section 3.2 since it is hard for ad- 

5 They originally proposed to use symmetric encryption (instead of Gen(r )). The 
conversion described here is a variant mentioned in [9]. 
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Encryption of m: 


Decryption of c: 


r := Rand 


2/4 


— M S&I,en( C )_n(e) 


m := Prep{m) 


2/3 


= V McBliece (Lsb„{c)) 


2/i := Gen(r) © rh 


z 


= Lsb n {c ) ® yzG' 


2/2 := r © Hash(yi) 


(2/2! \yi) 


= ( 2 / 4 1 | 2/3) 


(2/4II2/3) := (2/2II2/1) 


r 


= 2/2 © Hashfyx) 


z := Conv(Hash z (r)) 


m 


= Gen(r) © 2/1 


II nM.cElie.ce. / \ 

c:= 2 / 4 ||£ (2/3,2) 


If 


Conn -1 ( z)=Hash z (r) 


return c 




return Prep - 1 (m) 




Otherwise 


reject c 



Fig. 6 . Conversion (3 : Len{yz ) = k and Len{yi) = Len(r\\m) — k If Len(r\\m) = k, 
remove 2/4. 



Encryption of m: 


Decryption of c: 


r 


= Rand 


2/s 


— MsbLen(c)-n(c) 


fh 


= Prep{m) 


2/3 


= X> McEHeCe {Lsbn (c) ) 


2/i 


= Gen(r) © (m|| Const) 


z 


= 2/3 G' © Lsbn(c) 


2/2 


= r © Hash{yi) 


z 


= Conv~ 1 (z) 


(2/5 1| 2/4 1| 2 / 3 ) 


= ( 2 / 2 II 2 / 1 ) 


2/4 


= Lsby log 2 C(n,t)J ( 2 ) 


z 


= Conv{yA ) 


( 2 / 2 II 2 / 1 ) 


= ( 2 / 5 1| 2/4 1| 2 / 3 ) 


c 


= 2/5 11^ (2/3,*) 


r 


= 2/2 © Hashfyx) 


return 


c 


{m\\Const') 


= 2/1 © Gen(r) 






If 


Const 1 =Const 








return Prep 1 (rn) 






Otherwise 


reject c 



Fig. 7. Conversion 7 : Len(y 3 ) = k, Leniyf) = |_log 2 G(n, 2 )J , Len{ys) = Len{m) + 
Len(Const) + Len(r) — Len{yf) — k. If Len{fh) + Len(Const ) + Len(r) = Len{yf) + k, 
remove 2/5 • 



versaries to abuse decryption oracles because of the difficulty of generating an 
appropriate ciphertext without knowing its plaintext, and to guess the input to 
the original McEliece PKC in our conversions even if they know the plaintext to 
our conversions. 

More formally, the following theorem is true for our conversions in the random 
oracle model (where Gen, Hash and Hash z are assumed to be ideal). 

Theorem 1 To break the indistinguishability of encryption of our specific con- 
versions in an adaptive- chosen- ciphertext scenario is polynomial equivalent to 
decrypt the whole plaintext of an arbitrarily given ciphertext of the original 
McEliece PKC without any help of decryption oracles and any knowledge on 
the target plaintext. 
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Table 1 . Comparison between Data Redundancy and Conversions 



Conversion 

Scheme 


Conversion 

Type 


Complexity* 2 


> 2 bb tl 


>2 iU1 ' 9 




Data Redundancy* 1 = Ciphertext Size - Plaintext Size 


(n, k) 
t 


(1024,644) 

38 


(2048, 1289) 
69 


(4096, 2560) 
128 


Pointcheval’s 

[24] 


Generic 


n + Len(r) 


1184 


2208 


4256 


Fujisaki 
-Okamoto’s [9] 


Generic 


n 


1024 


2048 


4096 


Our proposal 
a and j3 


Specific 


n — k + Len(r) 


540 


919 


1696 


Our proposal 

7 


Specific 


n — k + Len(r) 
+Len(Const) 
L 1o §2 C(n,t ) J 


470 


648 


1040 


Original 

McEliece 


None 


n-k 


380 


759 


1536 



*1: The numerical results are obtained under the setting that Len(r) = 160 and 
Len(Const) = 160. 

*2: The asymptotic lower bound of the expected number of iterations to invert an arbi- 
trary ciphertext of the original McEliece PKC using the finding-low-weight-codeword 
attack. The exact complexity is estimated in [4]. 



Note that, as mentioned in Section 3, it is still infeasible to decrypt the whole 
plaintext of an arbitrarily given ciphertext of the original McEliece PKC with 
appropriate parameters (without any help of decryption oracles and any knowl- 
edge on the target plaintext). 

This theorem can be proven, in the random oracle model, by showing how 
to construct an algorithm which decrypts an arbitrary ciphertext of the original 
McEliece PKC using an algorithm which distinguishes a ciphertext of our con- 
verted versions in the adaptive-chosen-ciplrertext scenario. (It is obvious that an 
algorithm, which can decrypt the original McEliece PKC, can also distinguish a 
ciphertext of our converted versions.) Details are described in Appendix A. 

5 Conclusion 

We carefully reviewed the currently known attacks to the McEliece PKC, and 
then confirmed that, without any decryption oracles and any partial knowledge 
on the corresponding plaintext of the challenge ciphertext, no polynomial-time 
algorithm is known for inverting the McEliece PKC whose parameters are care- 
fully chosen. Under the assumption that this inverting problem is hard, we in- 
vestigated, in the random oracle mode, how to convert this hard problem into 
the hard problem of breaking the indistinguishability of encryption with CCA2. 
While some of the generic conversions are applicable to the McEliece PKC, they 
have a disadvantage in data redundancy. A large amount of redundant data is 
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needed for the generic conversions since the block size of the McEliece PKC is 
relatively large. Our specific conversions can achieve the reduction of the re- 
dundant data down to 1/3 ~ 1/4 compared with the generic conversions for 
practical parameters. This means about 3K bits can be saved for n = 4096, with 
providing semantic security against CCA2. 



Acknowledgments 

The authors would like to thank Hung-Min Sun, Pierre Loidreau, Kwangjo Kim 
and Yuliang Zheng for useful discussions and comments. 



References 

1. C. M. Adams and H. Meijer. “Security-Related Comments Regarding McEliece’s 
Public-Key Cryptosystem”. In Proc. of CRYPTO ’87, LNCS 293, pages 224-228. 
Springer-Verlag, 1988. 

2. M. Bellare and P. Rogaway. “Optimal Asymmetric Encryption”. In Proc. of 
EUROCRYPT ’94, LNCS 950, pages 92-111, 1995. 

3. T. Berson. “Failure of the McEliece Public-Key Cryptosystem Under Message- 
Resend and Related-Message Attack”. In Proc. of CRYPTO ’97, LNCS 1294, 
pages 213-220. Springer-Verlag, 1997. 

4. A. Canteaut and N. Sendrier. “Cryptoanalysis of the Original McEliece Cryptosys- 
tem”. In Proc. of ASIACRYPT ’98, pages 187-199, 1998. 

5. W. Diffie and M. Heilman. “New directions in cryptography”. IEEE Trans. IT, 
22(6):644-654, 1976. 

6. D. Dolve, C. Dwork, and M. Naor. “Non-Malleable Cryptography”. In Proc. of 
the 23rd STOC. ACM Press, 1991. 

7. T. ElGamal. “A public-key cryptosystem and a signature scheme bsed on discrete 
logarithms”. In Proc. of CRYPTO ’84, pages 10-18, 1985. 

8. E. Fujisaki and T. Okamoto. “How to Enhance the Security of Public-Key En- 
cryption at Minimum Cost”. In Proc. of PKC’99, LNCS 1560, pages 53-68, 1999. 

9. E. Fujisaki and T. Okamoto. “Secure Integration of Asymmetric and Symmetric 
Encryption Schemes”. In Proc. of CRYPTO ’99, LNCS 1666, pages 535-554, 1999. 

10. J. K. Gibson. “Equivalent Goppa Codes and Trapdoors to McEliece’s Public 
Key Cryptosystem”. In Proc. of EUROCRYPT ’91, LNCS 547, pages 517-521. 
Springer-Verlag, 1991. 

11. S. Goldwasser and S. Micali. “Probabilistic encryption” . Journal of Computer and 
System Sciences, pages 270-299, 1984. 

12. C. Hall, I. Goldberg, and B. Schneier. “Reaction Attacks Against Several Public- 
Key Cryptosystems” . In Proc. of the 2nd International Conference on Information 
and Communications Security (ICICS’99), LNCS 1726, pages 2-12, 1999. 

13. K. Kobara and H. Imai. “Countermeasure against Reaction Attacks (in Japanese)” . 
In The 2000 Symposium on Cryptography and Information Security : A12, January 
2000 . 

14. V.I. Korzhik and A. I. Turkin. “Cryptanalysis of McEliece’s Public-Key Cryptosys- 
tem”. In Proc. of EUROCRYPT ’91, LNCS 547, pages 68-70. Springer-Verlag, 
1991. 




Semantically Secure McEliece Public-Key Cryptosystems 



31 



15. P. J. Lee and E. F. Brickell. “An Observation on the Security of McEliece’s Public- 
Key Cryptosystem”. In Proc. of EUROCRYPT ’88, LNCS 330 , pages 275-280. 
Springer-Verlag, 1988. 

16. P. Loidreau. “Strengthening McEliece Cryptosystem”. In Proc. of ASIACRYPT 
2000. Springer-Verlag, 2000. 

17. P. Loidreau and N. Sendrier. “Some weak keys in McEliece public-key cryptosys- 
tem” . In Proc. of IEEE International Symposium on Information Theory, ISIT 
’98, page 382, 1998. 

18. R. J. McEliece. “A Public-Key Cryptosystem Based on Algebraic Coding Theory”. 
In Deep Space Network Progress Report, 1978. 

19. A. J. Menezes, P. C. Oorschot, and S. A. Vanstone. “McEliece public-key encryp- 
tion”. In “Handbook of Applied Cryptography”, page 299. CRC Press, 1997. 

20. D. Naccache and J. Stern. “A New Cryptosystem based on Higher Residues”. In 
Proc. of the 5th CCS, pages 59-66. ACM Press, 1998. 

21. T. Okamoto, K. Tanaka, and S. Uchiyama. “Quantum Public-Key Cryptosystems”. 
In Proc. of CRYPTO 2000, LNCS 1880, pages 147-165. Springer-Verlag, 2000. 

22. T. Okamoto and S. Uchiyama. “A New Public Key Cryptosystem as Secure as 
Factoring”. In Proc. of EUROCRYPT ’98, LNCS If 03, pages 129-146, 1999. 

23. P. Paillier. “Public-Key Cryptosystems Based on Discrete Logarithms Residues”. 
In Proc. of EUROCRYPT ’99, LNCS 1592, pages 223-238. Springer-Verlag, 1999. 

24. D. Pointcheval. “Chosen- Ciphertext Security for Any One-Way Cryptosystem”. 
In Proc. of PKC 2000, LNCS 1751, pages 129-146. Springer-Verlag, 2000. 

25. P.W. Shor. “Polynomial-Time Algorithms for Prime Factorization and Discrete 
Logarithms on a Quantum Computer”. SIAM Journal on Computing, 26:1484- 
1509, 1997. 

26. J. Stern. “A method for finding codewords of small weight”. In Proc. of Coding 
Theory and Applications , LNCS 388, pages 106-113. Springer-Verlag, 1989. 

27. H. M. Sun. “Improving the Security of the McEliece Public-Key Cryptosystem”. 
In Proc. of ASIACRYPT ’98, pages 200-213, 1998. 

28. H. M. Sun. “Further Cryptanalysis of the McEliece Public-Key Cryptosystem”. 
IEEE Trans, on communication letters, 4:18-19, 2000. 

29. A. Vardy. “The Intractability of Computing the Minimum Distance of a Code”. 
IEEE Trans, on IT, 43:1757-1766, 1997. 



A Proof of Theorem 1 

A.l Indistinguishability of Encryption 

Recall a security notion called indistinguishability of encryption [11]. In this 
notion, an adversary A selects two distinct plaintexts m o and TOi of the same 
length in the find stage, and then, in the guess stage, A is given c which is the 
encryption of nib where b is either 0 or 1 with the probability of 1/2. Then A 
tries to guess b. The advantage of A is defined by 2Pr(Win) — 1 where Pr(Win) 
denotes the expected probability of A guessing b correctly. If A has a decryption 
oracle D (which decrypts any other ciphertexts than the target ciphertext c), 
it is called that this experiment is in the adaptive-clrosen-ciplrertext scenario. 
Otherwise, if A does not have it, it is called that this experiment is in the 
adaptive-chosen-plaintext scenario. 
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A. 2 Random Oracle 

A random oracle is an ideal hash or an ideal generator which returns truly 
random numbers distributed uniformly over the output region for a new query, 
but it returns the same value for the same query. On such random oracles, the 
following lemma is true. 

Lemma 1 Suppose that f is a random oracle. Then it is impossible to get any 
significant information on f(x) without asking x to the oracle, even if one knows 
all the other outputs of f except one corresponding to x. 

It is obvious that Lemma 1 is true since the output value of / is truly random. 

A. 3 Adaptive-Chosen-Ciphertext Security 

The proof of Theorem 1 for the conversion a is given by showing Lemma 3 is 
true. Before we show it, we prove Lemma 2 first. 

Lemma 2 (Adaptive-Chosen-Plaintext Security) Suppose that there ex- 
ists, for any Hash , and any Gen, an algorithm A which accepts Too, mi and c 
of conversion a where c is the ciphertext of m b and b £ {0, 1}, asks at most qc 
queries to Gen, asks at most qn queries to Hash z , runs in at most t steps and 
guesses b with advantage of e. Then one can design an algorithm B which accepts 
a ciphertext c of the original McEliece PKC, runs in t' steps and decrypts it with 
probability e' where 

6 — 6 2 Len ( r )+ 1 1 
t! = t + Poly(n,q G ,qH) 

and Poly(n, qc, qn) denotes a polynomial of n, qc and qn- 
Proof. 

The algorithm B can be constructed as follows. First the algorithm B simu- 
lates both Gen and Hash z referred by the algorithm A. From the assumption 
of A in Lemma 2, A must be able to distinguish b with the advantage of e for 
any Gen and any Hash z as long as the algorithm B simulates them correctly. 
Then, B chooses b, r and y 2 at random, and then defines both Hash z and Gen 
so that the ciphertext of (r| (mj,) should be (c| I 2 / 2 ) where c is a ciphertext of the 
original McEliece PKC which B wants to decrypt. That is, 



Gen(z) = f (r\\m b ) ® (t/i||t/ 2 ) 


(10) 


Hash,(r\\m b ) = f Conv~ 1 {z) = z. 


(11) 



For the other queries than z to Gen and (r\\m b ) to Hash,, they return random 
values, respectively. Even for these Gen and Hash z , A must be able to distin- 
guish b with the advantage of e from the assumption in Lemma 2 as long as B 
simulates them correctly. 6 



If A distinguishes b only for certain combinations of Hash z and Gen, then the fault 
must be in either Gen or Hash z or in both used in the combinations, and therefore 
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Then, can B simulate them correctly for any queries? The answer is no since B 
does not know z, and therefore B cannot simulate Hash z correctly when (r||m(,) 
is asked to Hash z . We define the following two events AskG and AskH. 

Definition 1 Let AskG denote the event that z is asked to Gen among the qc 
queries to Gen and this query is performed before (r\\mb) is asked to Hash z . Let 
AskH denote the event that (r\\mb) is asked to Hash z among the qn queries to 
Hash z and this query is performed before z is asked to Gen. 

Since Pr( AskG A AskH) = 0 in this definition, the following holds 

Pr (AskG V AskH) = Pr(AskG) + Pr(AskH). (12) 

Next, we estimate the upper-limit of Pr (Win), the probability of A guessing 
b correctly. From Lemma 1, without asking either z to Gen or asking (r||?rib) to 
hash z , one cannot get any information on the connectivity between (z,yi\\y 2 ) 
and (?’||r?Zfc), and therefore cannot guess b with a significant probability after the 
event (^AskGA^AskH). After the other event, i.e. after the event (AskGVAskH), 
A might guess b with more significant probability. By assuming this probability 
to be 1, the upper-limit of Pr(Win) is obtained as follows: 

, , k (1 - Pr(AskG V AskH)) 

Pr(Win) < P?’(AskG V AskH) + 

Pr(AskG V AskH) + 1 n ^ 

- 2 ' (3) 

From the definition of advantage, i.e. Pr(Win) = (e + 1) /2, the following rela- 
tionship holds 



Pr(AskG V AskH) > e. (14) 

Since both r and b are chosen at random by B , A cannot know them without 
asking z to Gen or asking ( V||m{ ,) to Hash z . Thus the probability of one query 
to Hash z accidentally being (r|| mb) is l/2 iera ( r ) +1 , and then that of at most qn 
queries is given by 

Pr(AskH) < 1 - \1- 2Len{r)+1 J < 2 Jn{r)+i • ( 15 ) 

The algorithm B can simulate both Gen and Hash z correctly unless the 
event AskH happens. And then, after the event AskG, B can recover the whole 
plaintext of the target ciphertext c of the original McEliece PKC using z asked 
to B. Thus, after the event (AskG A ^AskH), B can recover it. The lower-limit 
of this probability is given by 



this fault can be easily removed just avoiding using the combinations. Otherwise, 
i.e. if A distinguishes b for any combinations of Hash z and Gen, the fault must be 
in the conversion structure. 
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P?’(AskG A ^AskH) = Pr(AskG) 

= Pr(AskG V AskH) — Pr(AskH) 



> e 



Qh 

2-C/en(r) + l 



(16) 



from (12), (14) and (15). 

The number of steps of B is at most t + (T Dec + T G ) ■ q G + T H ■ q H where 
Toec is the number of steps for decrypting the original McEliece PKC using a 
new query to Gen as z, T G is both for checking whether a query to Gen is new 
or not and for returning the corresponding value, and Th is both for checking 
whether a query to Hash z is new or not and for returning the corresponding 
value. Since these parameters, To ec , T G and Th can be written in a polynomial 
of n, q G and qn, the total number of steps of B is also written in a a polynomial 
of them. 

□ 



Lemma 3 (Adaptive-Chosen-Ciphertext Security) Suppose that there ex- 
ists, for any Hash z and Gen, an algorithm A which accepts mo, mi and c of 
conversion a where c is the ciphertext of mb and b £ {0, 1}, asks at most q G 
queries to Gen, asks at most qn queries to Hash z , asks at most q G queries to a 
decryption oracle D, runs in at most t steps and guesses b with advantage of e. 
Then one can design an algorithm B which accepts a ciphertext c of the original 
McEliece PKC, runs in t' steps and decrypts it with probability e' where 

e - e 2 Len ( r )+ 1 C(n, t) ’ 

t' = t + P oly(n, q Gl qn , qp) 

and Poly(n, q G , qn , qp) denotes a polynomial of n, q G , qn and q G - 
Proof. 

The algorithm B can be constructed as follows. First, the algorithm B simu- 
lates random oracles Gen, Hash z and the decryption oracle D referred by A. As 
long as B simulates them correctly, A must be able to distinguish the given ci- 
phertext with advantage of e. How to simulate both Gen and Hash z is the same 
as in the proof of Lemma 2. The decryption oracle D can be simulated using 
the following plaintext-extractor [2]. The plaintext-extractor accepts a cipher- 
text, e.g. (c'Hj/^) where c' denotes a ciphertext of the original McEliece PKC, 
and then outputs either the corresponding plaintext of (c'|| y' 2 ), or reject it as an 
inappropriate ciphertext. 

Let gi and G, denote the i-tli pair of query and its answer for Gen. And then 
let hj and Hj denote the j-tlr pair of query and its answer for Hash z . From 
the queries and the answers obtained while simulating Gen and Hash z , the 
plaintext-extractor finds a pair of ( g., , Gj) and (hj, Hj) satisfying Conv(gi) = z’ , 
Conv(Hj) = z' and Gi ® (y'lWy^) = hj where y[ and z' denote the plaintext 
and the error vector of c! , respectively. If found, B outputs LsbL en ^ m ^(hi) as 
the plaintext of ( c'||y ' 2 ) where Len(m') = Len(c!) + Len(y' 2 ) — n + k — Len(r'). 
Otherwise B rejects it as an inappropriate ciphertext. 
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The plaintext-extractor can simulate D unless A asks an appropriate ci- 
phertext to D without asking both z' and G,; ® ( y'i\\y 2 ) to Gen and Hash z , 
respectively. In this case, the plaintext-extractor rejects the appropriate cipher- 
text, and therefore does not simulate D correctly. However it is a small chance 
that A could generate an appropriate ciphertext without asking them. Since the 
definition of appropriate is to satisfy 

Hash z {Gen{z') © {y'i\\y' 2 )) = z ' , (17) 

and it is impossible for A to know whether (17) is true or not without asking z' 
to Gen and asking Gen(z') © {y'iWy' 2 ) to Hcish z , respectively, from Lemma 1. 

We define AskD as the following event that at least one query out of at 
most qn queries to D accidentally becomes an appropriate ciphertext before the 
queries used in (17) are asked. Since the probability of one query to D being 
accidentally an appropriate ciphertext is 1/G(n, t), the upper-limit of Pr (AskD) 
is given by 

( 1 \ ^ D 

‘-cm) < 18 > 

Unless either AskD or AskH happens, B can correctly simulate the oracles 
referred by A. In addition, when AskG happens, B can recover the whole plain- 
text of c, the ciphertext of the original McEliece PKC. The lower-limit of this 
probability Pr(AskG A ^AskD A ^AskH) is given by 



P?’(AskG A ^AskH A ^AskD) 

= Pr(AskG A ^AskH) — Pr(AskG A ^AskH A AskD) 

> Pr(AskG A ^AskH) — Pr(AskD) 

> _ qh qp 

~ C 2 Le "(U+i C(n, t) ’ 



(19) 



The number of steps of B is at most t + (T]j e c + Tq) ■ qc + Th ■ qH + To • qp 
where Tjj ec , Tq and Th are the same as the parameters in the proof of Lemma 
2. The number of steps Tq is for the knowledge-extractor to verify whether (17) 
holds and then to return the result. Since these parameters, Tr> ec , Tq, Th and 
Th can be written in a polynomial of n, qc, qn and qn, the total number of 
steps of B is also written in a polynomial of them. 

□ 

Using the similar discussion to the conversion a, the lower limit of e's for 
conversions /3 and 7 are given by 



€ > e- 



(gG + g h z 4 

2 Len(r) 



qp) 



(20) 



and 



€ > e- 



q G 

2 Len (r) 



Qd 

^Len^C onst) ’ 



(21) 



respectively. 
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Abstract. Indistinguishability against adaptive chosen ciphertext at- 
tack (IND-CCA2) is the strongest notion for security of public key 
schemes. In this paper, we present the first IND-CCA2 schemes whose 
securities are equivalent to factoring n = pq under the random oracle 
model, where p and q are prime numbers. Our first scheme works for 
long messages and our second scheme is more efficient for short mes- 
sages. 



1 Introduction 

Indistinguishability against adaptive chosen ciphertext attack (IND-CCA2) is 
the strongest notion of security for public key schemes. Bellare and Rogaway 
showed that a trapdoor one-way permutation f can be converted into a IND- 
CCA2 public key scheme in the random oracle model [1] . They further presented 
another IND-CCA2 scheme [2], called OAEP, which is more efficient than their 
first scheme for short messages. 

RSA is believed to be a trapdoor one-way permutation. However, it is not 
known that inverting RSA is equivalent to factoring n = pq, where p and q are 
prime numbers. 

On the other hand, Okamoto and Uchiyama showed a probabilistic public 
key scheme such that inverting the encryption function is equivalent to factoring 
a special modulus n = p 2 q [3]. Fujisaki, Okamoto and then Pointcheval showed 
some conversions of Okamoto and Uchiyama scheme into IND-CCA2 public key 
schemes in the random oracle model [4,5,6]. Paillier presented a trapdoor one- 
way permutation by modifying Okamoto and Uchiyama scheme [7]. 

Paillier presented a probabilistic public key scheme which is IND-CPA under 
the composite residuosity assumption [8, Sec. 4], where IND-CPA stands for in- 
distinguishability against chosen plaintext attack. Paillier also showed a variant 
of his scheme which is a trapdoor one-way permutation if and only if inverting 
RSA is hard [8, Sec. 5]. Paillier and Pointcheval gave a conversion of Paillier’s 
scheme [8, Sec. 4] into a IND-CCA2 public key scheme in the random oracle 
model [9]. 

However, no IND-CCA2 scheme is known whose security is equivalent to 
factoring n = pq. In this paper, we present the first IND-CCA2 schemes whose 

K. Kim (Ed.): PKC 2001, LNCS 1992, pp. 36-47, 2001. 
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securities are equivalent to factoring n = pq in the random oracle model by 
using Kurosawa et al’s public key cryptosystem. Our first scheme works for long 
messages. Our second scheme is more efficient for short messages. 

Rabin’s public key cryptosystem [10] is as hard as factorization. However, it 
is not uniquely deciphered because four different plaintexts produce the same 
cipher. Williams showed that this disadvantage can be overcome if the secret 
two prime numbers, p and q, satisfy p = 3 mod 8,q = 7 mod 8 [11]. Kurosawa 
et al. [12] showed a public key cryptosystem such that (i) inverting is equivalent 
to factoring n = pq, (ii) the decryption is unique and (iii) p and q are arbitrary 
prime numbers. 

Related works: Cramer and Slroup showed an IND-CCA2 scheme in the standard 
model under the decision Diffie-Helhrian problem [13] 

2 Preliminaries 

Let A: be a security parameter. Let n(k) denote the length of a plaintext, where 
n(k) is bounded by some polynomial on k. 

2.1 Definitions 

Definition 1. A public key encryption scheme with a plaintext length function 
n(-) is a triple of algorithms, II = (Q,£,T>), where 

— Q, the key generation algorithm, is a probabilistic algorithm that takes a 
security parameter k and returns pair of ( pk , sk ) of matching public and 
secret keys, 

— £ , the encryption algorithm, is a probabilistic algorithm that takes a public 
key pk and a message x £ {0, 1}” to produce a ciphertext y, 

— V, the decryption algorithm, is a deterministic algorithm that takes a secret 
key sk and a ciphertext y to produce a message x £ {0, 1}" or a special 
symbol _L to indicate that the ciphertext was invalid. 

Our goal is to construct an encryption scheme which is indistinguishable 
secure (or semantically secure). We consider an adversary A = {A\,A 2 ) who 
runs in two stages. In the find-stage A\ is given an encryption algorithm £ and 
outputs a pair (xq, X \ ) of messages. It also outputs some string str, for example, 
its history and its inputs. In the guess-stage A 2 is given the outputs of A 1 , 
( Xq , X\) and str , and also y which is a ciphertext of a message Xb for random bit 
b. A 2 guesses a bit b' from Xq,X\ and y, and outputs a guessing bit 6'. 

A simple A 2 who always outputs 0 (or 1) can succeed guessing b with prob- 
ability 1/2. This shows that the minimal probability with which any A 2 can 
outputs correct bit is 1/2. We measure how well A is doing by the difference 
between 1 /2 and the probability in which A 2 can guess b correctly. Formally, we 
define the advantage of A as follows. 

Adv At n(k) = [Pr [(pk,sk) <- l fe ); (a: 0 , aq, str) <- Ai{pk)\b <- {0,1}; 

y <- £ P k{xb)\ A 2 (x 0 , xi,str, y) = b] — 1/2| 
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We consider two attack models, adaptive plaintext attack and adaptive ci- 
phertext attack, in which an adversary can repeatedly use encryption and de- 
cryption oracle, respectively. 

Definition 2 (IND-CPA). Let 77 = {Q,£,V) be an encryption scheme and 
let A = {Ai, A 2 ) be an adversary who can use the encryption oracle. If 

AdvA,n(k) > e(k) 

and A runs at most t(k) steps, we say that A ( t,e)-breaks 77(l fe ) in the sense of 
IND-CPA. 

If AdvA,n(k) is negligible for any adversary A, we say that 77 is secure in 
the sense of IND-CPA. 

Definition 3 (IND-CCA2). Let II = ( Q,£,T> ) be an encryption scheme and 
let A = (Ai, A 2 ) be an adversary who can use the decryption oracle. If 

AdvA,n(k) > e(k) 

and A runs at most t{k) steps with at most qn queries to decryption oracle, we 
say that A {t, e, qn) -breaks 77(l fc ) in the sense of IND-CCA2. If AdvA,n{k) is 
negligible for any adversary A, we say that 77 is secure in the sense of IND- 
CCA2. 

Some literatures use an other notion IND-CCA1 in which an adversary A = 
(Ai, A 2 ) can use the decryption oracle only in its find-stage: A\. Since the secrecy 
in the sense of IND-CCA2 is stronger than that of 1ND-CCA1 (and IND-CPA) 
[14], we focus on the security in the sense of IND-CCA2. 



2.2 Kurosawa et al’s Public Key Cryptosystem [12] 

Kurosawa et al. [12] showed a public key cryptosystem such that (i) inverting is 
equivalent to factoring n = pq, (ii) the decryption is unique and (iii) p and q are 
arbitrary prime numbers. Their scheme is described as follows. 

Key generation algorithm Q\ Choose two large primes p and q whose 
lengths are both k/2 bits. The secret key is a pair of p and q. 

The public key is pk = (TV, c) such that 



N = pq and = — 1, 

where denotes Legendre symbol. 

Encryption algorithm £ : For a message x £ Z* N , let 



Ypk{x*) 

Upk{x) 



x + c/x mod N 



0 if(#) = l 

1 otherwise 



Vpk{%) 



0 if x < c/x 

1 otherwise, 
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where (^) denotes Jacobi symbol. Then the ciphertext is 

Ypk (x) 1 1 Upk (x) 1 1 Vpk (x) , 
where 1 1 denotes concatenation. 

Decryption algorithm V: Suppose that a receiver is given a ciphertext y | |u| |x. 

He first solves the following equations by using p and q. 

x 2 — yx + c = 0 (mod p ) , 
x 2 — yx + c = 0 (mod q ) . 

He then obtains four solutions xi,X2,X3 and X4 since Y p k(x) is a four-to- 
one function. Among the four roots, just one x, satisfies u = U p k(xi) and 
v = Vpk(xi). The receiver finally decides that such x* is the message the 
sender sent. 



3 Proposed Scheme for Long Messages 

In this section, we present our first IND-CCA2 scheme whose security is equiva- 
lent to factoring n = pq in the random oracle model, where p and q are arbitrary 
prime numbers. It works for long messages. We combine Kurosawa et al’s scheme 
with Bellare and Rogaway’s scheme of [1], Note that the conversion of [1] requires 
a one-way permutation / while Y p k{x) of Sec. 2.2 is a four-to-one function. 

Remember that k is a security parameter and n(k) denotes the length of a 
plaintext. Let fco(fc) be an integer valued function bounded by some polynomial 
on k. Let G be a mapping from k bit strings to n bit strings and let H be a 
mapping from n + k bit strings to ko bit strings. They are treated as random 
oracles. 

Then our scheme is described as follows. 



Key generation algorithm Q: Choose two large primes p and q whose lengths 

are both k/2 bits. The secret key is a pair of p and q. 

The public key is pk = ( N , c) such that 



N = pq and = —1, 

Encryption algorithm £: Suppose that the input is a message x which is a 
n bit string. First, £ chooses a random number r £ Z ^ such that U p k(r) = 0 
and V p k(r) = 0, where 



Upkix) 

V pk (r) 



0 if(£) = l 

1 otherwise 

0 if r < c/r 

1 otherwise. 



Let 



^pfc( r ) = r + c/r mod N. 
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The output which is the ciphertext of x is given as 

x © G(r) 1 1 Y p k (r) 1 1 (a; 1 1 r) . 

Decryption algorithm V: Suppose that an input is z||j/||s. V first solves the 
following equations by using p and q. 

r 2 — yr + c = 0 (mod p ) , 
r 2 — yr + c = 0 (mod q ) . 

If there is no root, then it outputs _L, which means that the input ciphertext 
is illegal. Otherwise, it obtains four solutions r\,r 2 ,r% and r 4 since Y p ^{r) is 
a four-to-one function. Among the four roots, just one r* satisfies U p k(ri) = 0 
and V p k{ri) = 0. let us denote such r* by r without subscription. It computes 

x = z © G(r, ; ). 

If H(x\\ri) = s then it outputs x as the plaintext. Otherwise, outputs _L. 
See Fig.l. 




Fig. 1 . Proposed scheme for long messages 



Theorem 1. Suppose that there exists an adversary A which (t^ A \e^ A \qu)- 
breaks our first scheme in the sense of IND-CCA2 with at most qc queries to 
G and at most qn queries to H. Then there exists M which runs at most t^ M ^ 
steps and can factor N = pq with probability e^ M \ where 

t^ M ' > = t l ' A ' 1 + ( qc + qn + q D ){Ty{k) + A (n + k)) + TE U (k) 
e (M) _ e (^)(i _ q D 2~ k °)/2, 

where Ty(k) denotes the time complexity ofY p k(x) and TE U {k) denotes that of 
gcd (x,y). 

The proof of Theorem 1 will be given in the final version. 
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4 Proposed Scheme for Short Messages 

In this section, we present our second IND-CCA2 scheme whose security is equiv- 
alent to factoring n = pq in the random oracle model, where p and q are arbitrary 
prime numbers. It is more efficient than our first scheme for short messages. We 
combine Kurosawa et al’s scheme with Bellare and Rogaway’s scheme of [2]. 
Note that [2] requires a one-way permutation / while Y pk {x) of Sec. 2.2 is four 
to one. 

4.1 Scheme 

Remember k is a security parameter. Let fco(-) and Aq(-) be positive integer 
valued functions such that ko(k) + k\(k) < k for all k > 1. Let n(k) = k — 
k 0 (k) — k\(k) be the length of a plaintext. 

Let G be a mapping from ko bit strings to n + ki bit strings and let H be a 
mapping from n + k\ bit strings to ko bit strings. They are treaded as random 
oracles. 

Then our scheme is described as follows. 

Key generation algorithm: Choose two large primes p and q whose lengths 

are both k/2 bits. The secret key is a pair of p and q. 

The public key is pk = ( N , c) such that 

TV = pq and = —1, 

Encryption algorithm: Suppose that input message is x £ {0, 1}™. £ at first 
chooses a random bit string r of length ko, computes 

s = (a:||0 fcl ) © G(r), t = r®H(s), w = s\\t. 

Let 

U r w \ _ 1° lf (n) = 1 
p 1 1 otherwise 

-r r / \ f 0 if W K C/W 

pk{ w ) otherwise. 

If ( U p k(w),V p k(w )) = (0,0), it outputs y = Y pk (w) as the ciphertext of x. 
Otherwise, it repeats choosing r untill w satisfies U p k(w) = V p k(w) = 0. 
Decryption algorithm: Suppose that the input ciphertext is y £ {0, l} fc . V 
solves 

w 2 — yw + c = 0 (mod p) 
w 2 — yw + c = 0 (mod q) . 

If there is no solution it outputs _L. Otherwise there are four solutions 
W\,W 2 ,W 3 and uq. Just one of them satisfies U p k{w) — V pk {w ) = 0. It finds 
such Wi and sets Wi = s||f where s and t are n and ko bits, respectively. It 

computes x \ |;r = s ® G(t © H(s)) where x is n bit string and 2 is the rest. If 

z is all zeros it outputs x, otherwise J_. 
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Fig. 2. Proposed scheme for short messages 



See Fig. 2. 

Theorem 2. Suppose that there exists an adversary A such that it (t ^ , e^)- 
breaks our second scheme in the sense of IND-CPA with at most q G queries to 
G and at most q E queries to H. Then there exists M which runs at most 
steps and can factor N = pq with probability e^ M \ where 
t (M) = f (A) + qc te(7V(fc ) + Afc ) + TEu ( k) 
e (M) = e (A) ^ _ qa2 -k 0 _ q H 2 -( n + kl )y 2 _ qc2 -k 

for some constant A where Ty(k) denotes the time complexity of Y p k(x) and 
TEu{k) denotes that ofgcd(x,y). 

Theorem 3. Suppose that there exists an adversary B such that it (t( B \ e^ B \ <j_d)- 
breaks our second scheme in the sense of IND-CCA2 with at most qc queries 
to G and at most qn queries to H. Then there exists A which ( y A A \e tyA ' > ) -breaks 
our second scheme in the sense of IND-CPA with at most qc queries to G and 
at most qn queries to H , where 

t (A) = t (B) + q G q H q D (T Y {k) + A k) 
e (A) = e (B)^ _ qo 2 -fci) _ qD 2 -fci / 2 

for some constant A. 

The proof of Theorem 2 and 3 are given in the next subsections. From The- 
orem 2 and 3, we straightforward obtain the following corollary. 

Corollary 1. If there exists an adversary B such that it (t( B \ e^ B \ qo) -breaks 
our public encryption scheme in the sense of IND-CCA2 with at most qc queries 
to G and at most qn queries to H , then there exists M which runs at most t 
steps and can factor N = pq with probability e^ M \ where 

t (M) = t (B) + qGqH ( qD + l)(7Y(fc) + A k) + T Eu (k ) 
e (M) = e (B) (l - q D 2~ kl ){l - q G 2~ k ° - q H 2~ {n+k ^)l2 
-to 2“ fel (l - q G 2~ ko ~ q H 2- (n+fel) )/4 - q G 2~ k 
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4.2 Proof of Theorem 2 

First, we construct M which factors its input N efficiently using A = (A 1; A 2 ). 
This is done by finding a random m' with (m! /N) = — 1 and then using the 
attacker to extract a preimage m of y = f{m') inequivalent to m' by using the 
Bellare/Rogaway inversion algorithm. Since the latter does not use the permu- 
tation property, our method works. 

Firstly, M randomly chooses c from Z* N which satisfies (-^) = 1 and sets 
pk = ( N,c ). M chooses randomly rh from Z^ which satisfies U p k{rh) = 1 and 
V p k(rh) = 0 and computes y = Y p k(m). Then it runs Ai(pk). After this, Ai will 
make G-queries and/or H- queries. To answer them M prepares two empty lists, 
G-list and H- list, and performs the following. 

G-query for g : If G-list includes an entry (g,G(g)), return G(g). Otherwise, 
for all entry (h,H(h)) in if -list compute 

m = h\\(g © H(h)). (2) 

If there exists h such that 

V = Y pk (m), U p k(m) = V p k (m) = 0, (3) 

then obtain p,q from gcd (IV, to — rh). Choose G{g) which is a random bits 
string with length n+fci, add an entry ( g , G(g)) into G-list, and return G(g). 
H- query for h: If there is an entry (h,H(h)) in H- list, return H[h). Otherwise, 
choose H{h) which is a random bits string with length k.Q and add an entry 
( h,H(h )) into H- list. For all entry ( g,G(g )) in G-list computes Eq.(2). If 
there exists g which satisfies Eq.(3), then obtain p,q from gcd (N,m. — rh). 
Finally, return H{h). 

Then A\ will outputs (xo,x\, str). 

Next, M chooses a random bit 6, and runs A 2 (xq, x\, str, y). If A 2 makes 
G-queries and U-queries, M answers in the following ways. 

G-query for g: If there exists an entry ( g,G(g )) in G-list, return G(g). Oth- 
erwise, compute Eq.(2) for all entry (h,H(h)) in H- list. If there exists h 
which satisfies Eq.(3), then obtain p,q from gcd (N,m — rh) and return 
G(g) = (a:b| |0 fcl ) ® h. Otherwise, choose G(g) which is a random bits string 
with length n + fci, add an entry ( g , G(g)) into G-list, and return G(g). 

H- query for h: If there exists an entry (h,H(h)), return H(h). Otherwise, 
choose H{h) which is a random bits string with length and add an en- 
try (h, H{h )) into H- list. For all entry (< 7 , G(g)) in G-list compute Eq.(2). If 
there exists g which satisfies Eq.(3), then obtain p, q from gcd(iV, m — rh). 
Finally return H{h). 

A 2 will outputs a bit b' . If M finds p and q then it outputs them, otherwise 
outputs J_. 
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If ( N , c) is a legal public key of the proposed scheme and there is a pair of 
( g , /i) which satisfies Eq.(2) and (3), gcd(iV, in — m) always presents p or q. For 
randomly chosen c, the probability with which (N, c) is a legal public key is 1/2. 
Therefore we can estimate similarly with the proof of Theorem 3 in [2] 
without the factor 1/2. 

The running time of M is also similar with the proof of Theorem 3 in [2] 
except M needs additional time to compute p and q using gcd. When ( N , c) is 
not a legal public key, it is possible that A never finish. In such a case M waits 
during and halts A. 

4.3 Proof of Theorem 3 

First, we show how to construct A\j A 2 which uses B\/B 2 as subroutine. This is 
done by showing that the clrosen-ciplrertext attacker is answered invalid plaintext 
with high probability for a decryption query unless the attacker has previously 
asked the random oracles G and H the queries corresponding to an encryption 
of the plaintext, the knowledge of which allows the recovery of the plaintext (i.e. 
the attacker is already aware of the plaintext of the decryption he is asking for) . 

(Ai : find stage) 

On input pk A-\ initializes G-list and H - list with empty and runs B\(pk). 
After this, B\ will make G-query, H - query and decryption query. A\ answers 
these queries in the following ways: 

G-query for g : A\ makes the same query to G oracle, obtains G(g), gives it 
to i?i, and then adds an entry (g,G(g)) into G-list. 

H- query for h: A\ answers to it in similar way to the case of G-query and 
adds an entry into H- list. 

The most troublesome task is to answer decryption queries for y. To simulate 
the decryption oracle without knowing the secret key, A\ runs Dec-simulator : 

Dec-simulator: On input y, for all entry ( h,H(h )) and ( g,G(g )) in H - list and 
G-list, compute m = h\\(g ® H(h)) and check y = Y p k(m) and U p k(m ) = 
V p k(m) = 0. If the check is not passed, return _L which means that y is an 
illegal ciphertext. Otherwise set x\\z = h ® G(g) where x is n bits and z is 
the rest. If z = 0 fcl then return x, otherwise _L. 

Finally, B\ outputs (xq,x\ , str). A\ outputs (xo, X\, str\ | G-list | |i?-list). 

(A 2 : guess stage) 

On input (xq, X\, sfr 1 1 G-list ||Ff -list, y) A 2 runs B 2 on input (xq,Xi, str, y). 
After this, B 2 will make G-query, U-query and decryption query. A 2 answers 
these queries in the same way as A\. Finally, A 2 outputs a bit b' or _L which B 2 
outputs. 

We will estimate the probability AdvA,n(k). Let us denote the event “Dec- 
simulator can simulate decryption oracle correctly for all queries” with success d- 
Clearly, 

Pr(V = b) > Pr (b' = b\successD) x Pr (successo)- 
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From the definition of AdvA,n and the hypothesis of the theorem, 

e (B) = | p r (5' = blsuccessn) — 1/2 |. 

Wlog., we assume that 



e (B) _ p r ( 5 ' = blsuccessn) — 1/2. 

Then 

Pr(6' = b\success£>) = 1/2 + e^ B \ 

Adv A ,n{k ) = | Pr(6' = b) - 1/2 | 

> | Pr(6' = 6|successi)) x Pr(sitccess£>) — 1/2 | 

= | (1/2 + e^) Pr [success d) — 1/2 | 

= | e («Pr (success,,) - 1 ~ 

Now then, we need to estimate Pr(succeed£>). Assume that B\ or B 2 makes a 

query for a ciphertext y. We consider three cases. 

1. If there exists no in such that y = Y p h{m), then both Dec-simulator and 
decryption oracle will output _L, that is, Dec-simulator can simulate correctly. 

2. If there is m = s\\t such that y = Y p k(m) and U p k(m) = V p k(in) = 0, and 
there are ( g,G(g )) and ( h,H{h )) in G-list and .ff-list such that h = s,g = 
t ® H{h), Dec-simulator returns a correct plaintext (or _L) with probability 
one. 

3. If there is m = s||f such that y = Y p k(m) and U p k(m) = V p k(m) = 0, 
but there is no ( g,G(g )) and/or (. h,H(h )) in G-list and/or FZ-list such that 
h = s,g = t ® H(h), Dec-simulator returns always outputs _L. Ou the other 
hands, the legal decryption oracle will make Ff-query for h = s and G-query 
for g = t®H(h), and check whether z is all zeros where x\\z = h®G(g). Since 
G(g) is random, h ® G(g) is random. Then the decryption oracle outputs a 
message with probability 2~ kl . 

Consequently, 



Pr(swccessi) for one query) >1 — 2 kl , 

Pr (success d) > 1 — qD^~ kl , 

and 

Mv A ,n{k ) > e (s) (l - g D 2" fcl ) - 

Finally, we will estimate the time complexity. To answer one decryption query 
A needs to compute Y p k(m) and to compare for every h and g. Then, it is clear 
that 



t {A) (k) = t {B) (k) + q G qHqD(T Y (k) + A k). 
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5 Discussion 

In this section, we discuss about some variations of our schemes. 

First, we can reduce the running time of encryption four times in average if 
we add two bits ( U p k,V p k ) to the ciphertexts. In this case, we do not have to 
choose random numbers such that U p k = V p k = 0 while the ciphertext becomes 
two bits longer. 

Next, we can replace Y p k(r) with r 2 mod N. In this case, however, the ci- 
phertexts are not uniquely decrypted with small probabilities such as 

Pi = 3/2 fc ° in the scheme of Sec. 3, 

P 2 = 3 / 2 fel in the scheme of Sec. 4. 
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Abstract. Signcryption is a public key cryptographic primitive that ful- 
fills the functions of digital signature and public key encryption concur- 
rently, with a cost smaller than that required by the traditional signature 
followed by encryption method. The concept of signcryption, together 
with an implementation based on the discrete logarithm problem, was 
proposed in 1996. In this work, we demonstrate how to implement ef- 
ficient signcryption using high order (power) residues modulo an RSA 
composite. This contributes to the research of extending computational 
underpinnings of signcryption schemes to problems related to integer 
factorization. In the course of achieving our goal, we also show efficient 
protocols for user identification, and fast and compact digital signature 
schemes. 

Keywords: High Order Residues, Public Key Encryption, RSA, Signa- 
ture, Signcryption. 



1 Introduction 

The idea of using (power) residues in public key cryptography first appeared 
in [5] where Goldwasser and Micali showed how to use quadratic residues in 
randomized encryption in a bit-by-bit fashion. This early work was followed 
by Benaloh and Yung’s paper [3] where it was proposed to use r th residues, 
where r was a small 1 prime, to construct a more efficient randomized public 
key encryption scheme. In [21] Zheng, Matsumoto and Imai proved that the 
requirement of r being a small prime could be relaxed to a small odd integer. This 
was further relaxed in [9] where Kurosawa and co-workers showed that r could 
take the form of a small even integer. In [2] (see also [14]), Benaloh observed that 
one could employ the Chinese Remainder Theorem in decryption, which further 
relaxed requirements of the number r — it could be a large odd integer, provided 
that it contains only small and distinct prime factors. More recently, Pailier [15] 

1 By “small” one generally means that the relevant parameter is bounded from above 
by a poly-logarithmic function of a security parameter. Likewise, a “large” parameter 
is one bounded from above by a polynomial function of a security parameter. 

K. Kim (Ed.): PKC 2001, LNCS 1992, pp. 48-63, 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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discussed how to construct probabilistic public key encryption schemes involving 
n th residues modulo n 2 , where n is an RSA composite. 

In all the public key encryption schemes presented in these successive papers, 
except those special schemes proposed in [15], decryption involves exhaustive 
search over a space whose size is dictated by a prime factor of r. This explains 
why these randomized encryption schemes do not work when r has a large prime 
factor. 

While all the prior work on the use of r th residues had been mainly limited 
to the construction of randomized public key encryption (requiring the number 
r contains only small prime factors), the present work demonstrates applica- 
tions of r th residues, with r being a large prime, in constructing efficient user 
identification, digital signature and more important, signcryption schemes. 2 

Signcryption was first proposed in [19] as an efficient public key cryptographic 
primitive that achieves both message confidentiality and non-repudiation with a 
much smaller cost than that required by digital signature followed by public key 
encryption. The first implementation of signcryption was based on the discrete 
logarithm problem over a finite field, which admitted a natural analogue on an 
elliptic curve over a finite field [20]. The same observation applies also to other 
sub-group based public key cryptosystems such as the XTR [12]. This early effort 
left as an interesting research topic to find a signcryption scheme that relies for its 
security on other computationally hard problems such as factoring large integers. 
Progress in this line of research has been made recently in [17]. Results presented 
in this paper represent yet another approach to building signcryption schemes 
on the integer factorization problem. 

Section 2 provides background knowledge on high order (power) residues. 
Section 3 shows how to construct user identification protocols (also called pass- 
port protocols) that are based on high order residues modulo an RSA composite. 
This is followed by Section 4 where the identification protocols are converted into 
efficient digital signature schemes. The usefulness of high order residues is high- 
lighted in Section 5 where a signcryption scheme called HORSE is presented. Ef- 
ficiency of the identification, signature and signcryption schemes, both in terms 
of computation costs and expanded bits, is analyzed in Section 6. Finally the 
paper is closed with a summary in Section 7. 

2 High Order (Power) Residues 

The intension of this section is to summarize some of the core mathematical 
background that is required in understanding the identification, signature and 
signcryption schemes to be presented in this paper. Some useful further infor- 
mation on higher order residuosity can be found in [21] 

2 Identification and signature schemes proposed in [10,16] rely on properties of a sub- 
group related to an RSA modulus, and hence appear to be technically different from 
the present work. Furthermore, schemes in [10] work only when r is “small”, as they 
require in their setting up stage the extraction of r th roots and search over a space 
of r elements. 
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Let r and n be positive integers, and 2 be an integer relatively prime to 
n (i.e., gcd( 2 ,?r) = 1). If there exists an integer x such that z = x r mod n, z 
is said to be an r th (power) residue modulo n. Otherwise z is said to be an 
r th nonresidue modulo n. 

The set of integers [0, 1, . . . , n — 1] is denoted by 7Z n , and the set of integers 
in 7Z n that are relatively prime to n is denoted by 2Z* n . 

We are interested in the case where r is a prime of at least 120 bits in size or 
length in binary representation, and n is an RSA modulus, i.e., n = pq , where 
both p and q are large (> 250 bits) primes. We further require that the three 
primes r, p and q be related by 

gcd(r , p-l)=r, gcd (r, q - 1) = 1. 

In practice, one may choose p and q in such a way that they take the form of 

p = 2 rp' + 1, q = 2q' + 1 

where both p 1 and q' are primes that are different from r. 

For r and n of the above forms, an element 2 £ 7Z* n is necessarily an 
r th residue modulo q , and it is an r th residue modulo p if and only if = 

1 mod p. Thus 3 is an r th nonresidue modulo n if and only if 

z ( P -i)/ r ± ! mod p 

As a consequence, when the factors p and q are known, one can quickly verify 
whether or not 2 : £ 2Z* n is an r th residue modulo n. 

Note that 1/r of the elements in z £ 7Z* n are r th residues modulo n, and 
the remaining (r — 1) /r of the elements are all r th nonresidues modulo n. This 
makes easy the task of finding an r th nonresidue modulo n. 

Definition 1. We say that three integers (r, n, h) are a good triplet if they fulfill 
the following requirements: 

1. r is a prime whose size (length in binary representation) is at least 120 bits. 

2. n = pq is an RSA modulus of at least 512 bits, satisfying gcd (r,p — 1) = r 
and gcd (r, q — 1) = 1. 

3. h is an r th nonresidue modulo n, or equivalently, h(P~ 1 ')/ r ^ 1 mod p. 

It should be pointed out that there is a slightly more general version of a 
good triplet (r, n, h) in which r is defined as an odd integer with distinct prime 
factors ri, 7 * 2 , . . ., r t . One of the prime factors of r must be large, say of 120 
bits in binary representation, n and r are related in the same way as in the 
above definition. The number h is an r\ h nonresidue modulo p for every factor 
Vi, i.e., hf v ~^H ri ^ 1 modp for all i = 1, 2, . . . , t. The identification, signature 
and signcryption schemes to be proposed in the forthcoming sections will all 
work with respect to such a more general version of a good triplet, although our 
discussions will be focused on the case where r is a large prime. 
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Fact 1 For a good triplet ( r,n,h ), every element x £ 7Z,* n can be presented as 



x = h l ■ w r mod n 

for a unique integer i £ 7Z r , where 7Z, n = [0, 1, . . . , r — 1] and a not necessarily 
unique w € 7Z* n . The number i is called the class-index of x. 

Finding the class-index of x with respect to a good triplet appears to be 
infeasible, even if one has the knowledge of the factors of n. Currently known 
methods for solving the problem require the knowledge of p and q , and involve 
search over 7Zi r . The average computation time required by such an algorithm 
is in the order of r/2, which renders the algorithm ineffective when r is a large 
prime. It should be pointed out that two of the classical methods for solving 
the discrete logarithm problem in a group, namely Shank’s baby-step-giant-step 
method and Pollard’s rho method (see [13]), do not appear to be applicable to 
the class-index problem under consideration, although both methods run faster 
than exhaustive search. 

Another fact of importance is that the degree of difficulty in solving the 
class-index finding problem is not effected in any way by the choice of h, as 
the problem falls into a class of problems that share an interesting property 
called random self-reducibility (with respect to h). (See [1] for more discussions 
on random self-reducibility.) This fact will be used later in designing efficient 
schemes. 

These observations form the computational basis of our new identification, 
digital signature and signcryption schemes to be presented in the coming sec- 
tions. 



3 Identification Using High Order Residues 



3.1 Basic Protocol 



At the setting up stage, a user Alice first chooses a good triplet ( r,n,h ). In 
addition, she also chooses at random x a from 7Z r = [0, 1, . . . , r — 1] and w a 
from 7Z* n . Namely x a £r 7Z r and i u a £r ZZ* nl where £r indicates an element 
is chosen uniformly at random from a set. 

Alice then forms 



Va 



1 

h Xa ■ 



mod n 



She keeps x a and w a as her important private key, and publishes the triplet 
(r, n, h) and y a as her public key. 

Later when Alice wishes to prove to another user Bob that she is indeed Alice, 
she first forwards to Bob her (certified) public key. Bob verifies the authenticity 
of Alice’s public key, and if he is satisfied with the verification, the two users 
then engage in the protocol specified in Table 1. At the end of an execution of 
the protocol, Bob would accept Alice’s claim if and only if the verification at 
Step 4 is successful. 




52 Yuliang Zheng 



Table 1. Identification Using High Order Residues 



Step 


Alice Channel Bob 

Public key: r, n, h, y a 
Private key: x a ,w a 


1 


Alice chooses x £r 7Z r and 
u £r ZZi* n - She then forms 

y = h x ■ u r mod n 3/ 

and sends y to Bob. 


2 


Bob chooses 

b Gr 7Z r 

4= b 4= 

and forwards it to Alice as a 
challenge. 


3 


Alice forms 

s = x + b ■ x a 
v = u ■ w a mod n 

=> s, v => 

She then passes s and v over to 
Bob. Note that no modular opera- 
tion is involved in the calculation 
of s. 


4 


Bob verifies whether or not 

h 3 ■ v r ■ y a = y mod n 

Bob accepts Alice if and only if 
the equation passes the test. 



3.2 Efficient Variants 

A number of methods can be considered to improve the efficiency of the basic 
identification protocol. 



A Small h. As was discussed earlier, the computational difficulty of the class- 
index finding problem is not dependent on how h , an r th nonresidue modulo n, 
is chosen. Thus a small h may be selected so that it uses less memory and helps 
speed-up computations involving h. For a large r, an overwhelming majority 
of elements in 7Z* n are r th nonresidues. Hence the smallest r th nonresidue 
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h can be easily identified by verifying whether 

h (v-i)/ r ^ i mod p 

for h = 2, 3, 4, . . where p is a factor of n. 

Shorter y and b. In Step 1, Alice may choose to send a hashed value of 
h x ■ u r mod n to Bob, that is 



y = Ti[h x • u r mod n) 

where TL is a one-way hashing function. Accordingly, the verification by Bob in 
Step 4 should be modified to 

TL{h s ■ v r ■ y b a mod n) = y 



In Step 2, Bob may send Alice a shorter b , say of 60 bits, as a challenge. 
These improvements will reduce the bandwidth of messages exchanged between 
Alice and Bob. 



Shorter w a and u. As the generation of secure random bits may consume 
substantial computational resources, Alice may choose to generate w a and u 
from a smaller range, say Z 2 so = [0, 1, ... , 2 80 — 1]. 

Generating w a and u deterministically. Alice may choose to generate w a 
and u in the following way: 

w a = H(x a , r, n, h) 
u = Tl(x, r, n, h) 

where TL is a one-way hash function. This will completely eliminate the need of 
generating random bits for these two values. 

Removing w a and u altogether. A more efficient variant of the protocol is 
to fix w a to 1, while choosing x a from a range greater than 7Z r . More specifically, 
Alice can choose 

x a &R ZZill = [ 0 , 1 ,..., 2 ^ — 1 ] 

where i may be at least 40 bits longer than the size of r. Namely t > |r| + 40, 
where | • | indicates the number of bits in the binary representation of an integer. 
Interestingly, with this modification, the number r will be used only in the setting 
up stage, but not in an identification process afterwards. Thus r no longer needs 
to be made public. 

At the setting up stage, Alice first chooses a good triplet (■ r,n,h ). She then 
chooses x a Gr an d forms 

y a = mod n 

hr a 
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Alice then keeps x a as her important private key, and publishes n, h and y a 
as her public key. (Note that the use of the number r is now limited to the 
generation of h.) 

When Alice wishes to prove to another user Bob that she is indeed Alice, she 
first forwards to Bob her (certified) public key. Bob verifies the authenticity of 
Alice’s public key, and if he is satisfied with the verification, the two users then 
engage in the protocol described in Table 2. 



Table 2. Identification Using High Order Residues — A More Efficient Version 



Step 


Alice Channel Bob 

Public key: n, h, y a 
Private key: x a 


1 


Alice chooses x Gr and 

forms 

y = T~t(h x mod n) V 

She then sends y to Bob. 


2 


Bob chooses 

b Gr 

<= b <= 

and forwards it to Alice as a 
challenge. 


3 


Alice forms 

s = x + b ■ x a 

and sends it to Bob. Note that no 
modular operation is involved in 
the calculation of s. 


4 


Bob verifies whether or not 

7-t{h s • y b a mod n) = y 

Bob accepts Alice if and only if the 
equation passes the test. 



As x a € 2Z 2 l and t > |r| + 40, x a can be expressed as = x' a + / • r for 
some 0 < x' a < r and /. ^From this it follows that 



h Xa mod n = h Xa ■ {hf) r mod n. 
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Thus the efficient protocol in Table 2 can be viewed as one obtained from the 
protocol in Table 1 by letting hJ mod n play the role of w a £ 7Z* n . 

Note that the protocol in Table 2 has also incorporated other ideas discussed 
in this section, especially on shortening y and b. Also note that since the x chosen 
in Step 1 essentially plays the role of “masking” the secret key x a in Step 3, it 
should be sufficiently long, say \x\ > \x a \ + |6| +40. Assuming that \b\ w £/ 2 and 
i > 160, it would be adequate to have |x| = 1.75£. 

4 Digital Signature Using High Order Residues 

4.1 A General Signature Scheme 

The identification protocol described in Table 1 can be converted to a digital 
signature scheme by substituting the role of Bob with an one-way hash function. 

Alice sets up all the required parameters (including both public and private 
keys) in the same way as described in Section 3.1. Alice’s public key is composed 
of y a and a good triplet ( r,n,h ). Her private key is a pair of numbers x a and 
w a which are chosen, uniformly at random, from 7Z r = [0, 1, . . . , r — 1] and 7Zi* n 
respectively. The public and private keys are related by y a = mod n. 

To sign a message m, Alice first chooses at random x from 7Z r and u from 
Zf*, She then generates three numbers ( b,j,v ) as her signature on the message 
m as follows: 



b = H(h x ■ u r mod n, to) 
s = x + b ■ x a 
v = u ■ w b a mod n 

Here TL is an one-way hash function, and the calculation of s does not involve a 
modular operation. 

Given to and (b,s,v), one uses Alice’s public key to verify the authenticity 
of the signature by checking 

H(h s ■ v r ■ y b a mod n,m) = b 

The signature is deemed authentic only if the equation holds. 

We note that while our signature scheme bears some similarities to a scheme 
proposed in [7], there is an important technical difference. Namely, the scheme 
in [7] requires that p — 1 and q — 1, where p and q are the factors of an RSA 
composite n, share a large common divisor / which is needed in the generation 
of a signature. 

4.2 A More Efficient Signature Scheme 

Techniques for improving the efficiency of the identification protocol that have 
been discussed in Section 3.2 can be employed to make the signature scheme 
more efficient. To highlight the improvements, in the following we specify the 
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digital signature scheme that corresponds to the efficient identification protocol 
presented in Section 3.2. 

As in Section 3.2, Alice’s private key is x a Gr 7Z 2 e, where £ > |r| + 40, and 
her public key consists of three numbers n, h and y a . The public and private 
keys are related by y a = mod n. There is no need to publish the number r, 
hence it can be erased at the completion of the setting up stage. 

To sign a message m, Alice first chooses x Gr She then forms her 

signature on the message m, which is composed of two numbers (&, s), as follows: 

b = H(h x mod n, m) 
s = x + b ■ x a 



The authenticity of Alice’s signature can be confirmed by verifying 
Ti{h s ■ y b a mod n,m) = b 

Table 3 summarizes the two signature schemes. Note that with the efficient 
version of the signature schemes, it is important that \x\ is sufficiently large, 
namely, \x\ > \x a \ + |6| +40. Once again assuming that |6| ss £/2 and £ > 160, it 
would suffice to have |x| = 1.75£. 



Table 3. Signature Schemes Using High Order (Power) Residues 



Signature 


Generation 


Verification of 


Length of 


scheme 


of signature 


signature 


signature 


General scheme 


m — > (m, b, s, v) : 






Public key: 
r, n, h, y a 


x Gr 7Z r 
u Gr ZZi n 


(m, b, s, v) : 
y = h s ■ v r ■ y b a mod n 


W)l + M + |n| 


Private key: 

Xa ? 'LL) a 


b = TL(h x ■ u r mod n, m) 
s = x + b ■ x a 
v = u ■ w b mod n 


accept only when 
H{y,m) = b 




Efficient scheme 


m — > (m, b, s) : 


( m,b,s ) : 




Public key: 
n , h, y a 


x Gr 7Z, 2 i .75< 


V = h s -y b a mod n 


\H(-)\ + 1.75 £ 


Private key: 
x a 


b = Ti(h x mod n, m) 
s = x + b ■ x a 


accept only when 
H{y,m) = b 
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5 HORSE — An Efficient Hight Order Residue 
Signcryption Engine 

Using the technique for constructing signcryption schemes that was first devel- 
oped in [19], the efficient signature scheme described in Table 3 can be used to 
design a new signcryption scheme whose security is related to the hardness of 
factoring large RSA moduli. 

Like the signcryption schemes in [19], some of the parameters for HORSE are 
required to be shared among all users. The only difference with [19] is that with 
the present scheme, these shared parameters must be generated either by trusted 
authorities, possibly in a distributed manner, or by a “black-box” computer 
mimicking the function of trusted authorities. 

To be more specific, the trusted authorities choose, on behalf of all users, a 
good triplet (r, n, h ) . The authorities may also choose an integer £ so that it is at 
least 40 bits longer than the size of r (in binary representation). Once ( r,n,h ) 
and t are chosen, the authorities publish h and n, as well as £. They then make 
the prime factors of n, i.e., p and q , and the number r inaccessible to users. 
Typically, this is done by erasing all the traces about p, q and r. 

Alice must first set up her own pair of public and private keys y a and x a . 
This is done by 



X a Gfl -2^2+ 
y a = h Xa mod 

Alice publishes y a in a public key directory, while keeping x a as her matching 
private key. 

Likewise Bob must also set up his pair of public and private keys yb and Xb' 



Xb £r 

yb = h Xb mod n 

Table 4 summarizes the setting up of signcryption. 

For Alice to signcrypt a message m to be sent to Bob, she carries out the 
signcryption operations detailed in Table 5. On receiving a signcrypted message 
from Alice, Bob can extract the original message by following the unsigncryp- 
tion steps indicated in the same table. Note that in describing the signcryption 
scheme, it is assumed that \KH.(-)\ k, £/2 and £ > 160. This results in the choice 
of \x\ = 1.75£, ensuring that \x\ > \x a \ + \b\ +40. 

To close this section, we point out that the way public and private keys are 
set up in the HORSE signcryption scheme also admits a system reminiscent to 
the ElGamal public key encryption scheme [4] . 

When a user Cathy wishes to send to Bob a message m in a secure way, 
she first chooses x -2^2+ and computes k = 'H{y^ mod n). Cathy then forms 
Ci = Ek(m), and C 2 = ^modn, and forwards to Bob the pair ( 01 + 2 ) as a 
ciphertext of m. Note that there is no need for Cathy to set up her public and 
private keys. 
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Table 4. Setting up for the Signcryption Scheme HORSE 



Parameters public to all: 

n — a large RSA modulus (chosen by trusted authorities) 
h — an r th nonresidue modulo n (chosen by trusted authorities) 
i — size of a secret key (may be chosen by trusted authorities) 

7 i. - a one-way hash function with |74(-)l > 128 

KH — a keyed one-way hash function with \KH. (-)| > 80 

(E, D) — the encryption and decryption algorithms of a private key cipher 



Alice’s keys: 




X a 


— private key ( x a 


ZZ2 f ) 


Ua 


— public key ( y a = 


= h Xa mod n) 


\ Bob’s keys: 




x b 


— private key ( Xb 




Vb 


- public key (j/t, = 


- h Xb mod n) 



On receiving ( 01 , 02 ) from Cathy, Bob can recover k by involving his private 
key Xb in the computation of k = c% b mod n. He can then proceed to extract the 
original message m from ci by m = Dh(c\). 



6 Efficiency of the Schemes 

We examine the efficiency of the identification, signature and signcryption 
schemes in terms of computational efforts invested and communication over- 
head required. With a protocol or algorithm employing public key cryptogra- 
phy, the dominant computation is modular exponentiations involving large inte- 
gers. When computing the product of several modular exponentiations, we can 
use a very effective technique that was discussed in Knuth’s book (see Exer- 
cise 27, Pages 465 and 637 of [8]; see also [18]). The same technique was later 
re-discovered by Shamir (see the last part of [4]). 



6.1 Efficiency of Identification 

We focus on the more efficient protocol specified in Table 2. Messages commu- 
nicated between Alice and Bob are very compact: \ + 2£ bits from Alice to 

Bob and £ bits from Bob to Alice. 

Alice needs to perform one modular exponentiation which can be pre-comput- 
ed well before the start of the protocol. Using the classical “square-and-multiply” 
method, on average the exponentiation takes 1.5 • 1.75£ = 2.625£ modular mul- 
tiplications. 

Bob needs to compute the product of two modular exponentiations. The 
size (or length) of the longer exponent s has 1.75£ bits. Using the fast method 
discussed in Knuth’s book, Bob can complete, once again on average, the com- 
putation in (1 + 3/4) | s | = 1.75 2 £ « 3£ modular multiplications. 
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Table 5. The Signcryption Scheme HORSE 



Signcryption by Alice the Sender: 

m — ♦ (c, d, e) 

1. Pick x €r ■Z?2 1 -75«, and let k = TL(yi mod n). 

2. Split k into fci and &2 of appropriate size. 

3. c = E kl {m). 

4. d = KHk 2 ( m , bind_info), 

where bind_info may contain, among other data, the public key (certificate) 
of the recipient, and optionally the public key (certificate) of the sender. 

5. e = x + d ■ x a . 

6. Send to Bob the signcrypted text (c, d, e). 



Unsigncryption by Bob the Recipient: 

(c, d, e) — > m 

1. Recover k from d, e, h, n, y a and Xb- 

k = H((h e ■ mod n). 

2. Split k into k\ and & 2 - 

3. m = D kl (c). 

4. Accept m as a valid message originated from Alice only if 
KHk 2 ( m , bind-info) is identical to d. 



6.2 Efficiency of Signature 

With the fast signature scheme (the second scheme) in Table 3, its signature is 
significantly shorter than the RSA signature scheme. More specifically, the size 
of our signature is |?f(-)| + l-75f bits. Assuming that = 80 and i = 200, 

the signature has only 430 bits. 

The signing procedure requires one exponentiation, or 1.5 • 1.75f = 2.625£ 
modular multiplications on average. This is much faster than the generation of 
an RSA signature which involves a full length exponent. 

The verification of a signature will take more time than the RSA signature 
scheme with a small public key, as it requires the computation of the product of 
two exponentiations, with s being the longer exponent. On average, the product 
takes (1 + 3/4) |s| = 1.75 2 f « 3f modular multiplications. 

6.3 Efficiency of Signcryption 

The communication overhead, measured in bits, of the signcryption scheme 
HORSE specified in Table 5, is 



|d| + |e| = \KH.(-) \ + 1.75f 
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Recall that the communication overhead of the traditional RSA signature 
followed by RSA encryption is 

\n a \ + | Tift | 

where n a is Alice’s RSA modulus and rib Bob’s. Clearly HORSE represents an 
significant improvement over RSA. 

The computational cost for signcryption is 

1.5 • 1.1U = 2.625f 

modular multiplications on average. The unsigncryption operation involves the 
computation of the product of two exponentiations. The two exponents are e- Xb 
and d ■ Xb ■ It is important to note that as <j>(n), the Euler’s <f>- function, is not 
known to Bob, the size of the exponents cannot be reduced! Clearly, the longer 
exponent is e • Xb which has 2.75 1 bits. Thus on average unsigncryption takes 

(1 + 3/4) • 2.75f « 4.8t 

modular multiplications. 

Together, signcryption and unsigncryption take 

7A£ 



modular multiplications. 

T compare HORSE with RSA signature followed by RSA encryption, we 
assume that \n\ = \n a \ = |rift|, that the size of r and the size of an output 
of the key-ed hash function KH are related by |r| = 1.5\KH.(-)\, and that 
£=\r\+0.5\KH.(-)\ = 2\KH.(-)\. 

We further assume that the Chinese Remainder Theorem is used in RSA de- 
cryption and signature generation, achieving the theoretically maximum speedup. 
Namely we assume that the average computational cost for RSA signature gen- 
eration is ^p|n a | = 0.375|n a | modular multiplications, and for RSA decryption 
it is | rift | = 0.375|?ift| modular multiplications. With RSA encryption and sig- 
nature verification, to simplify our discussions we consider two cases, although 
there are numerous other possible combinations for one to choose from in prac- 
tice. These two cases are: (1) small public exponents (say of 10 bits or less), and 
(2) f/2-bit public exponents, 

In order to examine how signcryption outperforms the signature-then-en- 
cryption approach, we define the advantages of signcryption as (1 — C sc /C s + e ), 
where C sc indicates the cost of signcryption, while C s+e the cost of signature- 
then-encryption. More specifically, we have 



advantage in 



average computational cost 

f 1 - o.375(|nt|+|n fc |) » for sma11 P ublic exponents 
\ 1 - 0.375(|n a hHn»|)+i.5< > for £ / 2 ~ Ut P ubUc exponents 



advantage in communication overhead 
_ _ \KH.(-)\ + 1.751 

\n a \ + | rift | 
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Table 6 demonstrates the advantages with respect to various key sizes. While 
the selection of parameter sizes in Table 6 is admittedly somewhat arbitrary, we 
note that it is still more conservative than a table suggested in [11]. 



Table 6. Advantage of Signcryption Scheme HORSE over RSA based Signature-Then- 
Encryption with Small Public Exponents 



security parameters 

M 

(K|,K|) i \KH.(.)\ [|r|] 


advantage in 

average computational cost 


advantage in 
communication 
overhead 


small public 
exponent 


£/ 2-bit public 
exponent 


1024 


160 


80 


[120] 


-54.1% 


-17.5% 


82.4% 


1280 


176 


88 


[132] 


-35.6% 


-6.4% 


84.5% 


1536 


176 


88 


[132] 


-13.0% 


8.0% 


87.1% 


1792 


192 


96 


[144] 


-5.7% 


13.0% 


88.0% 


2048 


192 


96 


[144] 


7.6% 


22.1% 


89.5% 


2560 


208 


104 


[156] 


19.8% 


31.0% 


90.9% 


3072 


224 


112 


[168] 


28.1% 


37.2% 


91.8% 


4096 


256 


128 


[192] 


38.3% 


45.2% 


93.0% 


5120 


288 


144 


[216] 


44.5% 


50.1% 


93.7% 


8192 


320 


160 


[240] 


61.5% 


64.3% 


95.6% 


10240 


320 


160 


[240] 


69.2% 


71.0% 


96.5% 



In some applications, one may wish to choose RSA public exponents that are 
longer than i/2, or even of full size, while in some other applications the Chinese 
Remainder Theorem may not be used in RSA decryption or signature generation. 
Furthermore, one may choose to select key sizes by following the suggestions 
in [11]. In all these situations, the signcryption scheme HORSE will demonstrate 
even greater savings in computation time and communication overhead. 



7 Concluding Remarks 

We have demonstrated applications of r th power residues modulo an RSA com- 
posite in constructing efficient identification protocols, digital signature and sign- 
cryption schemes. A major difference between this work and prior research is that 
here r is a large prime, or more generally an odd integer containing a large prime 
factor. Efficiency of our schemes is analyzed and compared with some existing 
solutions. Of particular interest to a practitioner in public key cryptography is 
the fact that the signcryption scheme HORSE is significantly more advantageous 
over the traditional “signature followed by encryption” approach using the RSA 
signature and encryption schemes, both in terms of computational and commu- 
nication overhead. A formal analysis of the security of the protocols and schemes 
presented in this paper remains a challenging topic for future research. 
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To close this paper, we summarize in Table 7 the main variants of signcryption 
known currently. 



Table 7. Currently Known Variants of Signcryption 





Computational Foundation 


Reference 


1 


discrete logarithm on a finite held 


Zheng, CRYPTO ’97 [19] 


2 


discrete logarithm on an elliptic curve 


Zheng, CRYPTO ’97 [19], 
Zheng & Imai, IPL (1998) [20] 


3 


factoring / residuosity 


Steinfeld & Zheng, ISW2000 [17], 
Zheng, PKC’01 


4 


other sub-groups (e.g., XTR) 


Gong & Harn, IEEE-IT (2000) [6], 
Lenstra & Verheul, CRYPT02000 [12], 
Zheng, CRYPTO’97 [19] 
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Abstract. We use bounds of exponential sums to show that for a wide 
class of parameters the modification of the DSA signature scheme pro- 
posed by A. K. Lenstra at Asiacrypt’96 is as secure as the original scheme. 



1 Introduction 



Let p and q > 3 be prime numbers with q\p — 1. As usual IFp and IF g denote 
fields of p and q elements which we assume to be represented by the elements 
{0, . . . ,p — 1} and {0, . . . ,q — 1}, respectively. 

For a rational number z and m > 1 we denote by [z\ the unique integer a, 
0 < a < m — 1 such that a = z ( mod m) (provided that the denominator of z is 
relatively prime to m). 

The Digital Signature Algorithm, or DSA, can be described in the following 
way. Let M be the set of messages to be signed and let h : M — > W q be an 
arbitrary hash- function. Let g £ IF p be a fixed element of multiplicative order q, 
that is, g q = 1, which is publicly known (as well as p and q). Finally, fix a certain 
element a £ IF* which is the secret key known only to the signer. For a message 
p £ A4 we select a random element k £ IF* called a nonce and we define the 
function 



r(k) 




and s(k,p) = \k 1 (h(p) + ar(k))\ . 



(1) 



The pair (r(k), s(k, p)) is the DSA signature of the message p with nonce k. 

Modular inversion of the nonce k in (1) is a time consuming operation. To 
improve the performance several inversion-free modifications of the basic scheme 
have been proposed, see [13] as well as Sections 11.5.2 and 11.5.4 in [8] and 
Section 20.4 of [14]. On the other hand, these schemes, although quite close 
to the original DSA scheme, may not be compatible with it, see the discussion 
in [6] . Thus to overcome the incompatibility problem (and a large signature size 
for some of the aforementioned modifications) a very different algorithm has 
been proposed in [6]. This algorithm follows the basic DSA scheme except that 
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the nonce k is generated in a special way which allows to generate k and [k 1 J 
simultaneously at reasonably low computational cost. 

The algorithm from [6] works as follows, in a special partial case. Given a prime 
q and two more integer parameters T > 2 and m > 1: 

o Select independently and uniformly at random 2 m integers t ± , . . . , f 2 m G 
[2,T] ; 

o For i = 1, . . . , 2m, compute iq = [g and Wi = ( qui — 1) /ti\ 

o For i = 1,...,2 m, using the identity t^ 1 = Wi (mod q), compute V{ = 

o Compute and output 

K — \ ti . . . tm^m+ 1 • • • ^2mJ q and A = [_Ul . . . V m t m + 1 . . . tgmj q • 

It is easy to see that A = The efficiency of the algorithm is based 

on the observation that for each arithmetic operation it performs one of the 
operands is of size T. Furthermore, once |_<7_| t . has been computed, the inversion 
required for the computation of iq involves only numbers of size < T. Thus 
if the bit length of T is essentially smaller than the bit length of q and m is 
reasonably small this algorithm is faster than the standard inversion modulo q 
using the Extended Euclid Algorithm. The efficiency of this algorithm (and its 
slightly more general form described in [6] ) has been numerically verified, see [6] 
and Section 4. 

However, it has remained an open question whether this new way of gener- 
ating k and undermines the security of the DSA. In [6, Section 3] some 

heuristic arguments in support of the security of the new scheme are given. At 
the rumpsession of Asiacrypt’96, S. Vaudenay [16] presented a partial cryptanal- 
ysis of the scheme that only affected the security if the t,; are chosen in some 
particularly bad way that is explicitly excluded in [6, Section 3]. 

In this paper we show that using bounds of character sums one can establish 
rigorous security results for the above scheme (for some values of the parameters 
T and m) . In fact we show that the distribution of the value of k is exponentially 
close to the uniform distribution. Therefore any algorithm attacking this modifi- 
cation immediately implies an attack on the original scheme with exponentially 
close probabilities of success. 

More precisely, for k £ IF*, let P m ,T{k) be the probability that the output k 
of the above algorithm equals k. We use some known bounds of exponential sums 
to prove that for a wide range of parameters T and m the statistical distance 

A(m,T)= J2 Pm, T (k) j- (2) 

feGF* Q 

is exponentially small, namely 

A{m,T) < q~ s 
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for some constant 6 > 0. The range of parameters allowed by this general result 
do, however, not seem to be of much practical value. We show that under the 
assumption of the Extended Riemann Hypothesis an essentially stronger result 
can be obtained that allows parameter choices in a more realistic and practical 
range. 

We stress that the uniformity of distribution of the nonce k is absolutely 
essential. Indeed, it has been shown in the series of papers [5,10,11] that the 
knowledge of some bits of k can be used to break the DSA (that is, recovering 
the private key a) in polynomial time. 



2 Preparations 



Let X be the set of multiplicative characters of the multiplicative group IF*, 
see Section 1 of Chapter 5 of [7]. We denote by X* the subset of non-trivial 
characters. 

We define 



a(T) 



max 



t—2 



Lemma 1. For any integers T > 2 and m > 1 the bound 
A(m,T) < g 1 / 2 c r (T) 2m_1 T _2m+1 / 2 



holds for the statistical distance A(m,T) given by (2). 



Proof. Let N m ^{k) be the number of sequences fi,...,t 2 m € [2,T] for which 
t\... tm.tm +1 ■■■ t 2 m = k ( mod <?)• Then P m ,T(k) = N m ^(k)T- 2m . 

From the following well-known identity 




1, if z = 1, 
otherwise, 



which holds for any 2 € IF* (cf. [7, Theorem 5.4]), we derive 
1 T 

N m ,T(k) = 'y ] y ] x(t i • • • t m t m+1 . . . t.2 m k ). 



We remark that x(^ -1 ) = x(^) f° r A G F* and that zX = |;?| 2 , where z denotes 
the conjugate of a complex number z. Therefore, changing the order of summa- 
tion, separating the term T 2m /(q — 1) which corresponds to the trivial character, 
and noting that fc -1 runs through IF* together with k we obtain 



N m ,T(k ) - 



T 2 



q- 1 



1 Z *( fc ) 



X6** 






t=2 



2m 
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Therefore 



rn2m 

E N ^r(k) T 

q — 1 

L.S-TW* ^ 



(9 -l) 2 



E E x(*o ExW 

fceF* \xe** i=2 



£TTl rj-\ 

T2E E Xi(k)x 2 (k) Ext (f) E : 

' feeF* xi,X2£A'* *— *> ■> 



t~o2 E ExiW ExaW E xi(fc)x2(fc). 

v„cV* 7 — 9 7 — 9 7.,-H?* 



Xi>X 2 €'*'* 1 1 — 2 



Using that the product of two characters is a character as well and the identity 



E X(k) = 



q - 1, if x = xo, 
0, otherwise, 



where Xo is the trivial character (cf. [7, Theorem 5.4]), we see that the inner 
sum vanishes unless 

X2 (k) = xi 0)^ = XiO^ 1 ) = Xi(fc)> k G F*, 

in which case it is equal to q — 1. Therefore 

r» m 2 777. m 2 777 

r p 2 m * i E E 

E N m,r{k) - — L = — Y E ExW Ex( fc ) 

fc<EF* q q x€X* t—2 t= 2 



E Exw < 



E Exw 



X6** |t=2 



X6** |t=2 



We have 



Hence 



— r £ £*«> <— r£ E*«> = T - 



x ex* lt=2 



XGX 1 1=2 



E < T" 4m E N m,Hk) - — 



< <j(Ty m - z T 



Am — 2 rp — Am + 1 



From the Cauchy inequality we obtain the desired result. 
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Thus to estimate the statistical distance we need upper bounds on a (T). The 
simplest and the most well known bound is given by the Polya-Vinogradov in- 
equality 

a(T) < q 1/2 \nq, 

see [9, Theorem 2.2], which is non-trivial only for T > q l ^ 2+e . However such 
values of T are too large to be useful for our application. Instead we use the 
Burgess bound, see [9, Theorem 2.3]. 

Lemma 2. For any e > 0 there exists 7 > 0 such that 



rr(T') < Tq " 7 

for T > q 1 / 4 + e and sufficiently large q. 

It is known that the Extended Riemann Hypothesis , or ERH, implies non- 
trivial upper bounds for much shorter sums. We therefore use a result that relies 
on the assumption of the ERH. In particular, we use a bound which follows from 
one of the results of [3] . 

Lemma 3. Let 

In T 



Then, assuming the ERH, the bound 

cr(T) < Tv- v/2+o{v) 



holds. 

Proof. We recall that an integer n > 1 is called F-smootlr if all primes dividing 
it are < Y. Let F(X,Y) denote the total number of F-smooth numbers < X. 
The following estimate is a substantially relaxed and simplified version of [4, 
Corollary 1.3]. Let X = F“; then for any u — > 00 with u < F 1 / 2 we have the 
bound 

&(X,Y) « Xu~ u+o{u) . (3) 

It has been proved in [3, Theorem 2] that 

a(T) = 0( V (: T , In 2 q In 20 In q)) , 

provided that u — > 00 . One easily verifies that the bound (3) can be applied to 
the last function with u = v/2 + o(v), producing the desired result. 
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3 Main Results 

Now we are prepared to prove our main results. 

Theorem 1. For any £ > 0 and A > 0 there exists a constant mo(e,A) > 0 
such that for any integers T >2 and m > 1 satisfying the inequalities 

T > g 1 / 4 " 1 " 5 and m > mo(e) 

the statistical distance A(m,T) given by (2) satisfies the bound A(m,T ) < q~ A . 
Proof. From Lemmas 1 and 2 we obtain the bound 

A(m,T) < g V 2 T -l/ 2 9 - 7 ( 2 m-l) < g l/4- 7 (2 m -l) < Q -A 

provided that in > (4 A + l)/8~/ + 1. 

Unfortunately the range of parameters allowed by Theorem 1 does not seem to 
be of any practical value. However under the ERH an essentially stronger result 
can be obtained. 

Theorem 2. Assume the ERH. Then for any A > 0 and any integers T > 2 
and m > 1 such that 

In T In q 

v = - — > oo and m > (2 A + 1) — b 1 

In In q v In v 

for sufficiently large q, the statistical distance A(m,T) given by (2) satisfies the 
bound A(m,T) < q~ A . 

Proof. From Lemmas 1 and 3 we obtain the bound 

A(m,T) < g 1 / 2 T -l/ 2 ^-(v/ 2 + 0 (v))( 2 m-l) < ? l/2 {J -»(2m-l)/3 
< gV2-(4A+2)/3 < q -A f 

provided that q is large enough. 

In particular, if q is about n bits long and T is selected about i bits long with 
l > lnn 1+e , then for to of order ?r/lnf' the algorithm of [6] generates a secure 
sequence of pairs n, A = |_ K_1 J q - Thus the values of T used in this algorithm can 
be rather small. 

4 Practical Considerations 

In [6] it was shown that generating k and simultaneously as indicated in 

Section 1 and with m = 3 is about as fast as the regular method of computing 
given a random k, for the common values n = 160 and (. = 32 where n and 
£ are the bit lengths of q and T, respectively. In the analysis of [6] it was assumed 
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that the regular method makes use of Lehmer’s method for the inversion. Thus, 
in environments where Lehmer’s inversion is available there does not seem to be 
any good reason not to generate k and \k ~ x \ q in the regular way. 

Lehmer’s method is about twice faster than regular modular inversion (which is 
based directly on the Extended Euclidean Algorithm) because it replaces most 
of the extended precision integer divisions by floating point approximations. The 
disadvantage of Lehmer’s method is, however, that it takes substantially more 
code and memory than regular modular inversion (or than the method from [6] ) . 
For computation in more restricted environments (such as a credit card chip) 
where the space and size needs of Lehmer’s method cannot be met, the method 
of [6] may therefore be an option, because it would be faster than regular modular 
inversion, even if m is taken as large as 6. 

Theorem 2, however, indicates that for n = 160 and l = 32 security can be 
guaranteed (under the ERH) only for substantially larger choices for m, namely 
m should be at least about 100. Obviously, such large m severely limit the prac- 
tical applicability of the method from [6] reviewed in Section 1, assuming that 
provable security of the choice of k is required: implementation of the method 
makes sense only if very limited space is available, and division of extended pre- 
cision integers (as required for regular modular inversion) is not available. It 
should be kept in mind, however, that the results presented in this paper are 
just theoretical lower bounds for the security and that in practice much smaller 
values of m should give satisfactory results, as also indicated in [6] . In fact even 
our theoretical results can be improved and extended; some further possibilities 
are indicated in Section 5. We do not present them here because our main moti- 
vation has been to indicate a possible way to establish rigorous proofs of security 
of the approach proposed in [6], rather than deriving all possible results of this 
kind. 

An alternative way of using the idea behind the method from [6] in the 
vein of the method of [1], as informally and independently proposed by several 
different people, is as follows. Compute and store Si = {ti,...,t 2 r } and the 
corresponding S 2 = {iq, . . . , i> 2 r } for some large value of r and compute the 
products over the four relevant random subsets of size m of Si [J S 2 for each 
pair k, \k ^ x \ q to be generated, where r is substantially larger than m. Given 
the successful attack (cf. [12]) on the method from [1], however, this approach 
cannot be recommended. 



5 Remarks 

The algorithm itself as well as all our main tools, can be extended to composite 
moduli. The only difference is that Lemma 2 holds in the present form only for 
square-free moduli, however a slightly weaker result is known in the general case 
as well (which is nontrivial for T > g 3 / 8 + £ ). 
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One can also remark that if T 2u < q for an integer v > 1 then 



E 

xe* 



E*w 

t = 2 



2v 



(q — 



where M V (T) is the number of solutions of the equation (rather than a congru- 
ence) 

t\ ... tv i • • • t 2v , H, . . . , t 2i , G [2, T ] 

which can be estimated using various number theoretic tools. In particular, the 
bound 

M V (T) < T" (1 + (v - 1) In T) 1/2-1 
has been given in [15, Lemma 4]. 

It is also worth mentioning that, under the ERH, one can improve Theorem 1 
(and Theorem 2 for larger values of T). Namely, for any e > 0, the ERH implies 
the bound 

a(T) = OiT'^q 5 ). (4) 

In fact, using the so-called “large sieve” method one can probably obtain 
quite strong unconditional results for “almost all” q rather than for all of them 
(which still suffices for cryptographic applications). 

On the other hand, there are infinitely many primes q such that for T = 
0(logq) and any m > 1 the statistical distance A(m,T) is very large. Indeed, 
it has been shown in [2] that there exists a constant c > 0 such that for in- 
finitely many primes q the smallest quadratic non-residue modulo q is at least 
c log q log log log q (under the ERH the same result is known with c log q log log q) . 
Therefore for such q,T = |_c log q log log log q\ and any m > 1 we have P m ,T(k) = 
0 whenever k is one of the (q— l)/2 quadratic non-residues modulo q. Therefore, 
in this case A(m,T) > 1/2. It should be noted that a large statistical distance 
does not imply that the corresponding signature scheme is insecure. 

A more general modification of the algorithm from [6] (where some of the ti 
and i>i are alternated in a random fashion in the expressions for k and A) can be 
studied quite analogously. 
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Abstract. We describe a new general method to perform part of the set- 
up stage of the XTR system introduced at Crypto 2000, namely finding 
the trace of a generator of the XTR group. Our method is substantially 
faster than the general method presented at Asiacrypt 2000. As a side 
result, we obtain an efficient method to test subgroup membership when 
using XTR. 



1 Introduction 

XTR is an efficient and compact method to work with order p 2 — p+ 1 subgroups 
of the multiplicative group GF(p 6 )* of the finite field GF(p 6 ). It was introduced 
at Crypto 2000 (cf. [4]), followed by several practical improvements a Asiacrypt 
2000 (cf. [5]). In this paper we present some further improvements of the methods 
from [4] and [5]. Given the rapidly growing interest in XTR our new methods 
are of immediate practical importance. 

Let p and q be primes such that p = 2 mod 3 and q divides p 2 — p+ 1^ let g be 
a generator of the order q subgroup of GF(p 6 )*, and let Tr(g) = g + g p + g p £ 
GF(p 2 ) be the trace over GF(p 2 ) of g. In [4] it is shown that the conjugates over 
GF(p 2 ) of elements of the XTR group (g) can conveniently be represented by 
their trace over GF(p 2 ), and it is shown how this representation can efficiently 
be computed given Tr(g). 

Given p and q the trace of a generator of the XTR group can be found as 
follows, as shown in [4]. First one finds a value c £ GF(p 2 ) such that F(c,X ) = 
X 3 — cX 2 + c p X — 1 £ GF(p 2 )[X] is irreducible over GF(p 2 ). Given an irreducible 
F(c,X), there exists an element h £ GF(p 6 )* of order > 3 and dividing p 2 — 
p + 1 such that Tr(h) = c. Actually, h is a root of F(c,X). This implies that 
Tr(g) can be computed as Tr{h^ p “P+ 1 )/?), assuming that this value is 3; if 
Tr(h^ p2 ~ p+1 ^ q ) = 3 another c has be to found such that F(c,X) is irreducible. 
Because F(c, X) is irreducible for about one third of the c’s in GF(p 2 ), on average 
3q/(q ~ 1) different c’s have to be tried before a proper c is found. 

Thus, for the XTR parameter set-up process one needs to be able to test irre- 
ducibility of polynomials of the form F(c, X) = X 3 — cX 2 + c p X— 1 £ GF(p 2 )[X] 
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